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WEDNESDAY, MAY 11, 1960 


Houser or REPRESENTATIVES, 
CoMMITTEE ON SCIENCE AND ASTRONAUTICS, 
INvEsTiGATING SUBCOMMITTEE, 
Washington, D.C. 

The subcommittee met at 11:15 a.m., Hon. Overton Brooks (chair- 
man) presiding. 

The CuarrMan. The subcommittee will come to order. 

This morning I want to say, Dr. Waterman—addressing myself 
to you; also through you to the others who have had to wait—we are 
very sorry. It is very regretful that we were compelled to ask you 
to wait, but we did have an important executive matter to dispose of. 

Dr. WaTEerMAN. We understand, sir. 

The CHarman. Therefore, for the resulting wait, we apologize 
very much. 

We are very happy to have Dr. Alan Waterman, head of the Na- 
tional Science Foundation, here with us this morning as a leadoff 
witness in this new hearing, “Machine Translation Research.” 

Dr. Waterman, I believe we have a prepared statement from you. 

Dr. WaTerMAN. Yes, sir. 

The Cnarman. Without further ado, because we are running so 
late, we would like to proceed with your statement. 

Dr. All right, sir. 


STATEMENT OF DR. ALAN T. WATERMAN,' DIRECTOR, NATIONAL 
SCIENCE FOUNDATION 


Mr. Chairman, members of the committee, we appreciate this oppor- 
tunity to discuss the matter of research in the field of mechanical 
translation. I shall first summarize the highlights of the history 


1 Biographical sketch of Dr. Alan T. Waterman, 

Dr. Alan T. Waterman was appointed Director of the National Science Foundation by 
the President of the United States on April 6, 1951. 

From 1946 to 1951, Dr. Waterman was with the Office of Naval Research, Department 
of the Navy, in the position of Deputy Chief and Chief Scientist. 

During World War II, Dr. Waterman served as Vice Chairman of Division D and as 
assistant to member, National Defense Research Committee. From 1943 to 1945, he was 
Deputy Chief and later Chief of the Office of Field Service, Office of Scientific Research 
and Development. 

A graduate of Princeton University, A.B., 1913, Dr. Waterman received the degree of 
doctor of philosophy in physics from Princeton in 1916. During the next year he was 
instructor in physics at the University of Cincinnati. After 2 years’ military service 
} midday to first lieutenant) with the Science and Research Division of the Army Signal 

orps in World War I, he joined the faculty of Yale University and remained in the 
department of physics there until 1942, with leave of absence during 1927—28 on a national 
research fellowship to King’s College, London, England, and to the Massachusetts Institute 
of Technology in 1937. From 1942 to 1946 he was on leave from Yale with the Office of 
Scientific Research and Development. Dr. Waterman holds honorary degrees of doctor of 
science from Tufts College, Northeastern University, the University of Vermont, and State 
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of this field of research, both in the United States and in other coun- 
tries, and then discuss briefly the interests and activities of the 
National Science Foundation with respect to mechanical translation 
research. With your permission, Dr. Adkinson, Head of our Office 
of Science Information Service, will then discuss the different 
approaches and objectives in this field of research and its relation to 
the broader field of applied linguistics and mechanized information 
processing in general. We shall then be pleased to answer any ques- 
tions you may have. 

More than 10 years ago scientists in various countries, including the 
United States, had begun to consider the possibility of translation 
through the use of digital computers. As early as 1947, at the In- 
stitute for Advanced Studies, Princeton, N.J.,.a program was written 
which would make it possible for a computer to look up words in a 
dictionary automatically. In July 1949, a memorandum entitled 


“Translation” by Warren Weaver, of the Rockefeller Foundation, 
brought the idea of mechanical translation before many people who 
had not previously considered it and aroused widespread interest. In 
1951, Yehoshua Bar-Hillel, at the Massachusetts Institute of Tech- 
nology, became the first full-time paid research worker in the field of 
mechanical translation research. 

Meanwhile, other workers continued to devote part of their time to 
this research, and in 1952 the first conference on mechanical transla- 
tion research was held at MIT, we understand, with the financial sup- 
port of the Rockefeller Foundation. In 1954, the National Science 
Foundation, recognizing the potential importance of this work, made 
a grant to MIT in support of a mechanical translation research group. 
In 1956, another conference devoted to mechanical translation was 
held at MIT, this time with the support of the National Science 
Foundation. 

Since 1954, many other research groups have undertaken the study 
of the problems inherent in automatic translation by computers; at 


Agricultural College, the University of Arizona, Bowdoin College, and the University of 
Akron. and the honorary doctor of laws from Cornell College. Mount Vernon, Iowa, Amer- 
ican University, the University of Chattanooga, the University of Michigan, and the Uni- 
versity of Cincinnati. 

For his war work with the Office of Scientific Research and Development, he was 
awarded the Medal of Merit in 1948. On June 14, 1952, the class of 1913, Princeton Uni- 
versity. awarded its class memorial cup to Alan Tower Waterman “in recognition of his 
meritorious and outstanding service to his profession and his country.””. On March 19, 
1957, the first annual Capt. Robert Dexter Conrad Award, established by the Office of 
Naval Research, was presented to Dr. Waterman in recognition of and reward for outstand- 
ing technical and scientific achievements in research and development for the Navy. 

Dr. Waterman has conducted research investigations in the field of conduction of elec- 
tricity through solids ; thermionic, photoelectric emission, and allied effects ; and electrical 
properties of solids. 

He is a fellow of the American Association for the Advancement of Science, the 
American Physical Society, the American Association of Physics Teachers, and the New 
York Academy of Sciences. He is a member of the American Association of University 
Professors. the Washington Academy of Sciences, the Washington Academy of Medicine, 
Phi Beta Kappa. Sigma Xi, the Scientific Research Society of America, and the Washington 
Philosonhieal Society. 

Dr. Waterman is a member of the Federal Council for Science and Technology, the 
National Aeronautics and Space Council, the Distinguished Civilian Service Awards Board, 
the Defense Science Board of the Department of Defense, the Committee on Specialized 
Personnel of the Office of Defense Mobilization, and is a consultant to the President’s 
Science Advisory Committee. 

Dr. Waterman is a member of the board of directors of the Center for Advanced Study 
in the Behavioral Sciences, the board of trustees of Atoms for Peace Awards, and of the 
board of directors of the American Association for the Advancement of Science. 

He was born June 4. 1892, in Cornwall-on-Hudson, N.Y. His legal residence is Mary- 
land. Dr. and Mrs. Waterman live at 5306 Carvel Road, Westmoreland Hills, Washington, 
D.C. They have three sons and two daughters, all married. 

Dr. Waterman is a member of the Cosmos Club, Washington, D.C., and the Graduates 
Club, Yale University, New Haven, Conn. 

AveustT 10, 1959. 
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the present time 11 groups in the United States are engaged in various 
aspects of mechanical translation research with the support of the 
Federal Government. The most recent conference in this country de- 
yoted to mechanical translation research was held at the University 
of California, Los Angeles, in February 1960, with the support of 
the Office of Naval Research. 

Scientists in the Soviet Union have also been active in mechanical 
translation research. They began work in this field in 1955, conduct- 
ing experiments on a computer at the U.S.S.R. Academy of Sciences. 
Both the Institute of Precision Mechanics and Computing Technique 
and the Steklov Mathematics Institute of the Academy of Sciences 
began research in this field at that time. Later, the Institute of 
Linguistics of the Academy, Leningrad University, and other institu- 
tions entered this research field. In May 1958, the First All-Union 
Conference on Machine Translation was held in Moscow. In April 
1959, a conference on mathematical linguistics was held in Leningrad 
which dealt largely with mechanical translation research. Soviet work 
in this field has been largely theoretical up to the present time, and 
very few experiments with computers have been mentioned in their 
literature. This is in contrast with the research in the United 
States, where experimentation with computers often has played a 
central role in the research process. 

British scientists have long been interested in mechanical transla- 
tion research, but full-scale research began only in 1955 when a grant 
for this purpose was made to Birkbeck College, University of London, 
by the Nuffield Foundation. In March 1957, the National Science 
Foundation and the Rome Air Development Center of the U.S. Air 
Force began joint support of the Cambridge language research unit in 
its mechanical translation research. In the spring of 1959, research in 
this field was begun by the National Physical Laboratory, at Tedding- 
ton, England, an organization that is roughly comparable to our 
National Bureau of Standards. 

Scientists in other countries have also been engaged in mechanical 
translation research. A group at the University of Milan in Italy 
has been studying this problem for a number of years, and since 
February 1959 has been supported by a contract with the Rome Air 
Development Center. 

In France, researchers interested in mechanical translation have 
recently formed an association for the study of problems in automatic 
translation and applied linguistics. In December 1959, a study center 
for automatic translation was established by the French National Cen- 
ter for Scientific Research. 

In Japan, the Electrotechnical Laboratory of the Japanese Govern- 
ment, Tokyo, has been conducting research in mechanical translation. 
Finally, according to a recent article in a Soviet journal, research 
in this field has been carried out in Communist China since 1958. 

I should like to turn now to the interests and activities of the Na- 
tional Science Foundation in this field. 

The Foundation, in its support of mechanical translation research, 
is interested not only in the specific objective of the automation of 
translation but also in the broader implications that this research on 
language may have for the future of information processing, particu- 
larly in the fields of automatic indexing and abstracting se in the 
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mechanization of systems for the storage and retrieval of scientific 
information. 

This is a very large problem, as you may well know. 

The interest of several other Federal agencies in immediate prac- 
tical results and their consequent support of several projects directed 
primarily toward such short-range objectives has made it possible for 
the Foundation to complement such work by emphasizing the support 
of projects engaged in longer range, more basic research. This 1s in 
line with the Foundation’s general policy of supporting worthwhile 
research which is not being supported by other agencies. 

A list of the projects that have been and are being supported by the 
Foundation is appended to this statement. It is the Foundation’s gen- 
eral practice, in this field as in other fields of research supported by 
the Foundation, where a proposal or application has been made to the 
Foundation requesting support for particular research, to submit such 
request to outside experts who assist us in evaluating it. In the inter- 
ests of coordination, we also distribute copies of proposals we are con- 
sidering to other Federal agencies supporting work in the field, and 
in some instances have supported projects jointly with other agencies. 

The Foundation supports only those projects that are considered to 
be of scientific merit and that usefully supplement or complement other 
work in progress. 

The Foundation has engaged in a number of activities designed to 
further cooperation among Federal agencies and among research 
groups in this field. The Federal agencies include those testifying in 
these hearings as well as the Office of Naval Research, which also 
supports research in this field. We have organized or in other ways 
encouraged formal and informal meetings both among administra- 
tors of the Federal programs in support of research in this field and 
among the research groups themselves. The next research conference 
will be held this summer under the joint sponsorship of ONR and 
the Foundation. We have held a number of discussions both with the 
other Federal agencies and with the outside research groups concern- 
ing criteria for effective reporting of research results in this field and 
have distributed to the research groups suggestions for reporting 
designed to make the exchange of research results as prompt and 
effective as possible. We have collected descriptive information about 
work in progress and for the past 3 years have published a semiannual 
report, titled “Current Research and Development in Scientific Docu- 
mentation,” which has included statements describing the work of 
mechanical translation research groups. This guide also lists the 
available publications and reports of the various groups. With the 
assistance of other Federal agencies we have seen to it that foreign 
language research papers in the field are translated and distributed 
to the research groups in this country. 

It is our belief that the problem of mechanical translation, in- 
volving as it does some of the most subtle aspects of human com- 
munication, is at the same time of the greatest importance to the 
scientific community and of the greatest difliculty, requiring years of 
intensive research for its solution. We are convinced that the prom- 
ising results obtained thus far warrant continuation of mee of 
research on mechanical translation by the National Science Founda- 
tion and other Federal agencies. 
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At this point, Mr. Chairman, as I suggested at the beginning, if it 
meets with your approval, I would like to call on Dr. Adkinson 
to give you more of the technical aspects of this subject. 

The Cuatrman. We will be happy to hear from Dr. Burton W. 
Adkinson, head of the Office of Science Information Service of the 
National Science Foundation at this time. 

I think we have a prepared statement from Dr. Adkinson. Isn’t 
that right, Doctor ? 

Dr. Apkinson. That is correct, Mr. Chairman. 

The Cuairman. Before we do that, I want to acknowledge the fact 
that we have representatives here this morning from the Kensington 
Junior High School of Kensington, Md. We are happy to have this 
young group with us this morning. I imagine all of them are inter- 
ested in mechanical translators. I recall when I was in school I would 
have liked to have a mechanical translator myself, as language gave 


me trouble. 


At any rate, Doctor, with that brief interlude, we would appreciate 


your proceeding. 


APPENDIX 


NSF grants for research on mechanical translation, fiscal year 1955 through 8d 
quarter of fiscal year 1960 


Amount transferred 
Duration | Amount | from other agencies 
Grantee institution Date of grant of of 
grant grant 
Amount | Agency 
Massachusetts Institute of Technol- | October O18 
| October 1959. ......... 196, 
Georgetown 100,000 | $65,000 | CTA 
Cambridge Language Research Unit_| March 1957_..-.......-| 1 year__-_- 27, 100 20,000 | RADC 
December 33, 000 20,000 | RADC 
Harvard University................- January 1958__........ 6 months_. 29, 150 15,000 | RADC 
September 1958_.....__| 1 year_.__- 220, 000 70,000 | RADC 
December 1959. 200, 000 100,000 | RADC 
University of September 1 year....- 
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NSF grant for research related to mechanical translation 


Grantee institution Date of grant Duration | Amount of 
of grant grant 
University of Pennsylvania (syntactic analysis of Eng- | October 1956__......-.| 3 months. - $1, 950 
lish for information retrieval). February 1957 1 year....- 24, 300 
February 1958 .| 16 months. 42, 300 
October 1958_ - ..| 6 months__ 31, 450 


Dr. Apxinson. Thank you. 


STATEMENT OF DR. BURTON W. ADKINSON,? HEAD, OFFICE OF 
SCIENCE INFORMATION SERVICE, NATIONAL SCIENCE FOUNDA- 
TION 


Dr. Apxinson. Mr. Chairman and members of the committee, as 
Dr. Waterman mentioned, I will address myself to a discussion of 
the different approaches and objectives in this field of research, and 
its relationship to the broader field of applied linguistics and mecha- 
nized information processing in general. 

Although the work in this field is often described simply as mechan- 
ical translation research, there are actually wide differences as Dr. 
Waterman said, in methodology, subject matter, and objectives among 
the various research groups. The research of some groups, for exam- 
ple, centers around computer experiments. On the other hand, some 
groups have had no recourse to computers at all, or have made only 
incidental use of such equipment. This difference in the use of equip- 
ment often reflects wide differences in the nature of the research. 
Even among those groups that use computers there is an important 
difference in approach or methodology. Some groups use a computer 
merely to verify that the procedures they have worked out function 
properly. Others prefer to use computers as a means of carrying 
out experiments with natural languages in order to learn more about 
language itself. These two uses of computers may be kept quite dis- 
tinct in some groups and blended in others, but in any case are indica- 
tive of important differences of approach. 

Another difference among groups is in the languages under study. 
Most of the groups in this country are studying translation from 
Russian to English because the practical need here is great. The 
large amount of Russian-to-English translation now being done by 
human translators is indicative of this need. Some groups, however, 
are also studying the problems involved in translation from other 
languages, such as French and German. 

Of course, in other countries work is often concentrated on trans- 
lation into the native language of the researcher, with the corollary 


2? Biegraphical sketch of Burton W. Adkinson: 

Educational background: B.A., 1936, University of Washington; M.A., 1939, University 
of Washington: Ph. D., 1942, Clark University. 

Positions held and dates: Teacher, public schools, Washington State, 1929-39; regional 
research assistant, office of the geographer, 1942-43; research associate and Assistant 
Director, Board of Geographic Names, 1943-44; Assistant Chief, Map Intelligence Section, 
Ottice of Strategic Services, 1944-46; Assistant Chief, Map Division, Library of Congress, 
1945-47; Chief, Map Division. Library of Congress, 1947-48: Assistant Director, Refer- 
ence Department, Library of Congress, 1948-49; Director, Reference Department, Library 
of Congress, 1949-57; head, Office of Science Information Service, National Science 
Foundation, 1949 to present. 


| 
| 
| 
Bs 
2 
| 
| 
{ 
| 
| 
| 
| 
| 
| 


RESEARCH ON MECHANICAL TRANSLATION 7 


that exchange of results on an international basis is often of interest 
largely from a theoretical or abstract point of view, since the actual 
detail with respect to any one language may be of little direct interest 
to a researcher studying another language. 

At the present time it is impossible to say to what extent methods 
for translating the literature of, say, physics will be applicable to 
translation of articles in some other field, say biology. It is clear that 
there are at least some differences in vocabulary which may make it 
necessary to study each discipline in which the researcher is interested 
if machine translation is ever to be achieved on anything comparable 
to the human level. At present in the United States, the groups 
studying the problem of Russian-to-English translation are concen- 
trating on such disciplines as physics, electronics, mathematics, chem- 
istry, and biochemistry. a, 

Groups differ greatly in their objectives. Some are aiming at a 
crude and yet useful product in the most immediate future. Others 
are interested only in high-quality translation by machine requiring 
no polishing or editing for its use. Regardless of the immediate 
objectives, however, the ultimate goal of mechanical translation re- 
search is complete automation of the process of translation, and we, 
in the Foundation, regard the terms “mechanical translation,” “ma- 
chine translation,” and “automatic translation” as referring to this 

oal of completely automated translation. The work thus far indi- 
cates that there is a strong possibility that this goal will one day be 
attained. Much more research is needed, however, to determine 
whether or not this ultimate goal is indeed possible, and if possible, 
economically feasible. 

It is possible that the crude machine output that can be produced 
at this time, consisting mostly of ungrammatical sequences of words, 
might be found useful by some organizations as indicative of the con- 
tent of processed material. We do not believe, however, that such 
unedited output will be useful to research scientists. 

There remains the additional possibility that machines may some- 
how be used to aid human translators in their work, so that the re- 
sulting man-machine complex will be able to translate either faster 
than a human expert, or more economically, or both. We under- 
stand that consideration is being given to the possibility of utilizng 
some of the intermediate research results and procedures to produce 
machine output that would then be converted by humans into usable 
translations. We have as yet, however, seen no convincing evidence 
that such partial automation of the translation process at this time 
would be an improvement over existing human translation. Before 
any sound conclusions can be reached concerning the usefulness of 
partially mechanized translation, there is need for study and objec- 
tive evaluation of (1) the quality of the machine output achieved by 
mechanized procedures developed by several of the research groups 
and (2) the amount of human effort required to convert the machine 
output into usable translations. 

Some of the research groups and sponsoring agencies have a still 
broader objective and are interested in mechanical translation as one 
aspect of a much larger problem, that of processing natural language 
by machine for a wide variety of purposes, including automatic 
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abstracting and automatic indexing for storage and retrieval systems, 

More and more, scientists are coming to the view that mechanical 
translation research must be viewed in this larger context of the 
automatic processing of natural language before its significance can be 
fully understood. An example of a group taking this point of view 
is that at the Rand Corp., which included the following statement in 
a 1ecent report on its work: 

A major report on Rand’s mechanical translation research is now in prepara- 
tion; any further results in it will be obtained as byproducts of the more 
general linguistic research envisioned. 

As another example, it should be noted that a Soviet publication 
formerly called “Bulletin of the Society for Machine Translation” 
had its name changed in 1959 to “Machine Translation and Applied 
Linguistics,” indicating a broader scope. Furthermore, the April 
1959 Conference on Mathematical Linguistics held in Leningrad dealt 
with this larger theme. One Soviet researcher, in an article entitled 
“Machine Translation in the Soviet Union,” has said: 
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Today machine translation is regarded only as the first stage toward solving 
a more general and more important problem: by most fully using electronic ma- 
chines as auxiliary tools of human thinking, to make the machine capable of 
performing the widest possible operations with texts written in different lan- 
guages, to enable it not only to translate but also to edit, make abstracts, fur- 
nish bibliographical and other references, etc. All these operations boil down 
to extracting from the text required information and to recording that informa- 
tion in some other form. 

The newly formed French society in this field is also devoted to the 
larger area of automatic translation and applied linguistics. 

One consequence of this broadening of scope is that 1t will become 
more and more difficult as time goes on to decide whether or not a 
particular group is engaged in mechanical translation research, As 
more has been learned about the problems involved, and a greater 
understanding attained concerning language in general, it has become 
clearer that various separate aspects of the problem must be studied 
independently before any general solution of the problem of automatic 
translation can be obtained. For example, it has now become clear to 
most mechanical translation researchers in this country that auto- 
matic syntactic analysis, or parsing, of the sentences of the language 
to be translated must be achjeved before high-quality translation will 
be possible; and most of the progress in the research thus far has 
consisted of experimental programs for accomplishing this analysis. 
This is not to say that the achievement of automatic syntactic analy- 
sis will mean that the whole problem of mechanical translation has 
been solved, but rather, that only one aspect of the problem has been 
solved. 

As for the problem of automatic parsing, a group of researchers at 
the University of Pennsylvania is studying the problem with respect 
to English. The work of this group provides an excellent example of 
the difficulty of classifying research in this area in that the Pennsyl- 
vania work is directed toward the storage and retrieval of informa- 
tion and is usually not considered to be mechanical translation re- 
search, even though the work the group is doing on English is very 
similar in principle to that which is occupying much of the time of 
some of the mechanical translation projects at present. Thus any 
evaluation of progress in the field of mechanical translation research 
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must take into account the progress in the larger field of the auto- 
matic processing of natural language. vac it must be realized 
that progress in the narrower field of mechanical translation research 
may well contribute to the solution of other related problems in the 
broader field. 

This concludes my statement, Mr. Chairman. 

The Cuatmrman, Thank you very much, Dr. Adkinson. 

We appreciate that statement very much. 

Now, Dr. Waterman, are there any further statements from your 
group, the National Science Foundation, at this time ¢ 

Dr. WarerMan. We have no further statements, Mr. Chairman, 
except the submission of this appendix of the list of our grants made 
for the purpose, some by the Foundation alone and some jointly. 

The Cnarrman. Would you like to analyze those grants very 
briefly, for the record ¢ 

Dr. Waterman. May I ask Dr. Adkinson to speak to that? We 
are quite ready to answer any questions you may have, Mr, Chairman. 

The CHAIRMAN. Yes, sir. 

Dr. Apkrnson. This listing of grants which is an appendix to the 
prepared statement of Dr. Waterman, as you notice, indicates the Na- 
tional Science Foundation grants for research on mechanical trans- 
lation for the fiscal year 1955 through the third quarter of fiscal year 
1960. It has been appended in order to give the committee a picture 
of the institutions and the amount of support that they have been 
given for this particular type of work. 

It also gives the committee a picture of the cooperative support 
of certain of the mechanical translation projects, among two or more 
of the Government agencies. In other words, this has not been a 
program within the Federal Government where each agency was 
working in isolation. As you can see from this, there has been joint 
support. There has been close cooperation and coordination in the 
support of research on this very difficult problem. 

his is all it was to illustrate, and it gives you roughly the 
amount of NSF funds expended and the amount of money that was 
transferred to us to help support some of these projects. 

On the second page of this appendix is the money that has been 
given to the University of Pennsylvania to work on the problem of 
analyzing English which we think has some relation to mechanical 
translation and has a great deal of significance to the larger problem, 
the automation of the abstracting and indexing of information in the 
natural language of English. 

I think that is all I need to say, sir. 

The Crratrman. Dr. Waterman, the Foundation has been financing 

nts in the field of machine translation research since 1955. Public 
aw 85-864 authorizes the Foundation to undertake programs to de- 
velop new or improved methods, including mechanized systems for 
making scientific information available. 

I believe it was the intent of the law to make the Foundation a 
focal point for research in this area. It was not, however, until 
March 10, 1980, that the first meeting of (he Interagency Committee 
on Mechanical Translation Research met. 

What are the objectives of this committee, and would you explain 
why it took so long to get the committee started into operation ? 


a 
ng 
‘ 
bee 
a 


10 RESEARCH ON MECHANICAL TRANSLATION 


Dr. Waterman. This was the first formal organization of the com- 
mittee, I believe, Mr. Chairman. 

Of course, the responsibility assigned by the Congress to us was 
broader than just mechanical translation. We had the problem of 
organizing and extending the Office of Scientific Information Service, 
and we had to appoint and assemble the Council which was a statutory 
requirement. Then the Council had to turn its attention to these 
things. 

I believe that the delay was merely one of setting this thing up for- 
mally ; we had been in touch with the agencies before that were known 
to be doing this work. In fact, you will notice that at Georgetown 
University and the Cambridge language research unit, and at Har- 
vard University, joint support was started as early as 1956, very much 
before the appearance of this statute. 

Dr. Adkinson may wish to add to my statement. 

Dr. Apxinson. Yes, Mr. Chairman. 

There had been, before March 1960, a number of meetings, informal 
meetings, on a rather regular basis, to discuss the problems of re- 
search in mechanical translation and administration of support. 

There also is a formal group that meets within the intelligence com- 
munity. It seemed to us it was necessary to have a committee repre- 
sentative of the Federal agencies supporting research in the field to 
consider ways of facilitating and coordinating the research itself, so 
we formed this committee in 1960, formalizing the informal meetings 
that were going on, and giving the committee a formal status rather 
than merely an informal status. It is an outgrowth of those other 
meetings. 

The Cuarrman. Dr. Adkinson, does the committee have the same 
representation that the CIA Subcommittee on Mechanical Transla- 
tion has? 

Dr. Apkrnson. Roughly the same, I believe. The CIA subcom- 
mittee, however, is comprised of representatives of the Central Intel- 
ligence Agency, the military departments, the National Security 
Agency, and the Department of State. These last two agencies are 
not supporting research in mechanical translation. In addition, a 
representative of the National Science Foundation is an associate 
member of the subcommittee. The members of the Interagency Com- 
mittee on Mechanical Translation Research, established recently by 
the Foundation, are representatives of the five agencies supporting re- 
search in the field: the National Science Foundation, the Central In- 
telligence Agency, and the three military departments. 

The CuairmMAn. Do the committees have duplicate or separate 
functions ? 

Dr. Apxtnson. That is right. They are considering the same field, 
but one is concerned with the interests of the intelligence community, 
and makes provisions for classified discussions. The committee that 
was set up informally—or now formally—that had met informally 
before, is for the purpose of discussing the administration of me- 
chanical translation research in an unclassified manner. 

The Cuatrman. In other words, the committee we are discussing 
now is for unclassified discussions ? 

Dr. Apxrnson. That is right. And with respect to function, the 
CIA subcommittee advises the Committee on Documentation of the 
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US. Intelligence Board with respect to all machine translation activ- 
ities, coordinates MT activities within the intelligence community, 
and informs its members of new projects and of the status of existing 

rojects. The function of the Interagency Committee on Mechanical 

‘anslation Research is to provide an opportunity for the sponsors 
of the research projects to achieve closer cooperation through in- 
creased understanding of their various objectives and policies; and 
also to consider ways of facilitating research, such as the scheduling 
of research conferences and requirements for the prompt and effective 
exchange of research results among the various groups. It is in- 
tended that this interagency committee will periodically report on the 
progress of mechanical translation research through the Federal Ad- 
visory Committee on Scientific Information to all Federal agencies 
that make use of translations of scientific materials. 

The Cuarrman. There are five Government agencies sponsoring 
research on machine translations—the Army, the Navy, the Air 
Force, CLA, and NSF. 

Do you consider this number to be proper for the pursuit of this 
single goal? I will ask Dr. Waterman or Dr. Adkinson, either one. 

Dr. Apxrnson. I would like to answer it in this way, Mr. Chair- 
man. ‘These are organizations that had an interest. They are sup- 

orting research that is useful in attacking the problems of mechan- 
ical translation in several different ways, in several subject fields and 
working with different. languages. And I do not think that you 
could say that the number is good or bad, but it is the number that 
have an interest in the field. They have found it advisable to sup- 
port mechanical translation research, and by coordination of their 
administrative efforts I think that the program has progressed well 
with this multiple support. Whether there should be more, I don’t 
think it is necessary, unless another agency felt the need to get in it. 
They have not. So I would say, Yes, it seems to be a useful number. 

The CuarrMan. Do you think that one or two could do the same 
thing ? 

Dr. Apxinson. They might be able to, one or two, but the agencies 
are supporting mechanical translation research at this time for dif- 
ferent purposes; that is, as I mentioned in my paper, we are interested 
in the long-range goal of completely automated translation as well 
as the larger goal of “Can we use the machines for handling the 
natural language?” 

That is our interest. 

Then you find that the military and intelligence agencies have a 
tremendous need for translations on a current basis. They are tryin 
to get. into production as rapidly as possible. They have enrsiee 
research. 

We are not in a position at the present time to say that the research 
done by one group is better than the research done by another. We 
can say, one group has done this, the other group has done that, 
and one may be a little different than the other, but which is better 
we are not in a position to say at this time. 

So that the exploration of the different avenues of research have 
been good. This is true in research in many fields, not only mechan- 
ical translation. You have multiple attacks on the problem from dif- 
ferent viewpoints, plus different end uses. 
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The CHarrMANn. With all of these agencies working, would it be 
practical for the National Science Foundation to provide computers 
for instance for research for all of them? 

Dr. WarrrMAN. First let me say, Mr. Chairman, that in this respon- 
sibility of ours for science information exchange, we believe the intent 
of the legislation was for us to take a leading part in seeing to it 
that the work done by other agencies was well known by all of them, 
and that each had a definite part to play, and this composite method 
of working out the approach was preferable to the Foundation trying 
to take over the entire operation. And as Dr. Adkinson said, other 
agencies have pressing Interests in certain aspects of the field, so 
that the best system seems to us to be the one where an agency with 
a specific interest is encouraged to cover that area, provided there 
is general cooperation among all. 

So we have taken that standpoint. 

Now, in the matter of providing computers for research, I believe 
computers are generally available to these other agencies which can 
be used for their purposes. The position of the Foundation is that 
if they are not available we can be ready to be of some assistance, 
as, for example, in our program on capital research facilities where 
among other things we are engaging in trying to establish large 
modern computing centers in various universities throughout the 
country so that university departments have access to computing 
facilities. 

Now, in cases like the universities mentioned where they have an 
interest in linguistics, this, of course, can be made available to them. 
I will ask Dr. Adkinson to respond further to this question. 

Dr. Apkinson. Yes, the funds that have been made available have 
been for research and for the rental of computer time on computers 
available to these research groups in or near their organizations. 
They haven’t been for the purchase of computers as far as I know. 
There has been in recent years money made available for developing 
new attributes of these machines—to assist, say, in developing large 
memories, dictionaries, which can hold a large amount of material in 
the machine so that words can be looked up. We have to have a dic- 
tionary when we are working with a foreign language. They have to 
have a dictionary in the machine, and some work has been done along 
this line. But in general, the money made available was for the rental 
of computer time to test out the theories and procedures which had 
been developed by the researchers. 

The CHatrmMan. Well, are we making such progress on this that 
you could say it isa usable machine at this time ? 

Dr. WaterMAN. This depends on the purpose for which one wants it. 

Speaking in general, if an agency or any group would like general 
information about a subject, for example, what the interest of a 
foreign country is in the whole subject, or in some phase of it, a 
crude translation will suffice. Thus one can know the topics that 
are being considered for example, and perhaps how they are treated. 
That is one extreme, where a crude translation will serve, and for 
some groups that is sufficient. 

Speaking specifically for the field of science, I think you all know 
that scientists are very particular about translations. In fact, it is 
very hard to find a human translator of science who will satisfy a re- 
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search scientist. There are fine points in the translation which matter 
very greatly to him. Ideally he would prefer to read the language 
himself, as scientists have learned to do in French and German over 
the years; but when it comes to a language with which he is not 
familiar, then he can quite often find fault with a human translator 
unless the translator is another specialist in his field and also knows 
the language very well. 

So for research, especially basic research in science, one needs to 
have a very accurate translation. One can say then, regarding the 

resent progress, that it may be sufficient for some groups, for certain 
interests. It is not sufficient for research science, as yet, nor for pre- 
cise translations. We are some distance from that. Would you not 
agree, Dr. Adkinson ? 

Dr. Apkinson. Yes; and in response to your particular question, 
I may have misled the chairman and the committee when I spoke 
about computers, because there are several types of computers that are 
used, and I was addressing myself to the procedures and the tech- 
niques for using these computers for translating. There are a num- 
ber of different types of computers now actually being used. So it 
isn’t the machine, it is the system that is used, the program and the 
philosophy, and the understanding of language, that is necessary to 
develop an efficient way to use the computer to do the translating, not 
the machine itself. The real tough part of it is the understanding of 
the language and developing a program for the computer. 

The Cuatrrman. Mr. Fulton? 

Mr. Fuuron. I am glad to have you here. 

On the end of Dr. Waterman’s statement there is a list, an appendix, 
of the National Science Foundation grants, for research on mechani- 
cal translation from the year 1955, through the third quarter of fiscal 
year 1960. 

I would like to have you submit a statement of what the instructions 
and purpose was on each of these grants. Likewise, what the progress 
has been on the information or the advances or developments 
we have received for our money. 

Dr. WarerMAN. I will be very glad to submit that. 

Mr. Fuuron. Then we can see how the money has been spent, and 
in what fields these people were operating, as well as an evaluation 
of the results. 

(The information requested is as follows :) 

The Foundation is pleased to submit the additional information requested 
concerning the instructions to and purposes of each mechanical translation 
research group that has received Foundation grants, and the accomplishments 
of each group thus far. We should like to explain first that the Foundation 
does not give instructions to the research groups working under Foundation 
grants. It is our belief that research can best be planned by the research scien- 
tists and technical specialists in each field of research, and not by the Founda- 
tion. It may be helpful to explain also the way in which the Foundation 
reaches decisions on applications, or proposals, for funds in support of research 
projects. A Foundation booklet, “Grants for Scientific Research,” serves as 
a guide to persons and organizations who wish to submit to the Foundation 
proposals for research grants. The booklet sets forth the following requirements 
with respect to the description of any proposed research project : 

“Proposal descriptions should begin with a brief abstract of the proposed 
research. This should be followed by a more detailed statement of the work 
to be undertaken; its objectives and expected significance; its relation to the 
present state of knowledge in the field; its relation to previous work done on 
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this project and to similar or related work in progress elsewhere; also a bibliog- 
raphy of pertinent literature. The general plan of work, including the 
broad design of experiments to be undertaken, if any, and the procedure antici- 
pated should be outlined. The appraisal by Foundation advisers and staff 
members as to the scientific merit of the proposed research will be influenced 
by the adequacy and character of this information.” 

In addition, each proposal contains information about the time period for 
which support is requested, the facilities available for the work, the background 
of the investigator who will be responsible for direct supervision of the proposed 
research and of other members of the research staff, and the estimated budget. 
Each research proposal is carefully reviewed both by staff members of the 
Foundation and by outside experts who are asked to assist in evaluating the 
scientific merit of the proposed research, the competence of the investigators 
who will be doing the work, and the reasonableness of the budget. 

If a proposal results in a grant of funds, the Foundation has in effect ap- 
proved the description of the proposed research and the general plan of work 
outlined in the proposal, and the research group follows that general plan, with- 
out, however, being bound to it absolutely; for in the course of its research a 
group may find it advisable to give some attention to a new approach or a new 
task not specifically mentioned in the research proposal. The Foundation ex- 
pects to be kept informed, of course, whenever any significant departure in 
the plan of research is made. 

The summaries of the plan of research and the accomplishments of each proj- 
ect that has received Foundation grants for mechanical translation research 
are as follows: 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 


Summary of plan of research 


The objectives of the work on mechanical translation at MIT have remained 
essentially unchanged since initiation of the project in 1954. The primary 
objective of this basic research program is to find out how languages can be 
translated by machine. Secondary objectives are concerned with evaluating 
the fidelity which can be achieved with different approaches, the usefulness of 
the resulting translations for various purposes, and their respective costs. A 
further objective is to add to the general knowledge of noncomputational use 
of digital computing machinery and to a basic understanding of human 
communication. 


Results achieved 


In light of these objectives considerable progress has been made. After con- 
sideration of the fidelity that could be achieved by various suggested techniques, 
it soon became evident to the MIT group that more knowledge of language and 
the translation process would be needed. Their most significant advances have 
been of a basic and fundamental nature, which will help to make it possible 
eventually to program computing machines to produce accurate and acceptable 
translations. The work that has been done on generative grammar and the 
theory of grammatical transformations is believed to represent an important 
advance in linguistics, making possible more precise descriptions of language and 
shedding considerable light on the relationship of syntax to some aspects of 
meaning. Early work, showing the necessity for sentence-for-sentence transla- 
tion rather than word-for-word translation, has now met with full acceptance 
by all groups working in the field. The conceptual framework that the group 
introduced nearly 2 years ago advanced the idea that mechanical translation 
should be a three-step process: analysis of the incoming sentence, choice of 
appropriate components of the output sentence, and synthesis of the output. 
This conceptual framework is gaining acceptance by an increasing number of 
mechanical translation groups. Much of the work has been concerned with the 
preparation of detailed grammars of English, German, and French and with 
continuing studies of some of the formal features of linguistic expressions, such 
as expressions of negation. This work is rapidly reaching fruition. The group 
first had to find out the best way of representing the grammar of a language 
for use in a machine. At the same time they have devised techniques for using 
the machine to aid in their research, including a programing language known 
as the COMIT system, for use with machines in linguistic work. 
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GEORGETOWN UNIVERSITY 


Summary of plan of research 


In this project it was proposed to extend the results obtained in the experiment 
conducted in 1954 by the institute of languages and linguistics of the university 
and the IBM Corp.; and to develop the additional rules required for the trans- 
lation of complete texts. Research was to be concentrated initially on the 
analysis of contemporary Russian texts in the field of organic chemistry. An 
experimental approach was conceived in which groups following three different 
approaches would be permitted to study the problem to see which would prove 
the most effective. A program of computer experiments aimed at gradual im- 
provement of one or more systems was planned. Work on other languages was 
planned to complement the work on Russian. 


Resulis achieved 


All three of the experimental procedures for translation of Russian were 
earried to the point of testing on computers. One of the methods, that of 
Paul Garvin, dealt primarily with the analysis of Russian syntax, while the 
other two methods were aimed at actual translation. It was shown that both 
the “code matching” technique and the “general analysis” technique had been 
programed to the point where crude output in English words, in which some of 
the problems of translation had been solved, could be demonstrated. The gen- 
eral analysis technique was selected for further study. Up to the present, a 
corpus of 268,000 running words has been utilized in the preparation of a dic- 
tionary of 10,800 entries. Furthermore, 115,000 words of Russian text have 
been processed by computer, and much of the output has been studied for the 
purpose of improving the programs. Recently, the code matching technique has 
been the subject of further study at the Corporation for Economic and Industrial 
Research at Arlington, Va., and the work of Paul Garvin has been continued 
at the Thompson Ramo Wooldridge Corp., Los Angeles, Calif. 

As for other languages, an experimental system has been developed for French- 
to-English translation and has been brought to the point where French nuclear 
physics texts can be converted into English words which in many cases convey 
the thought of the original. Preliminary research has been conducted on the 
problems of translating English into Chinese and English into Arabic. 

During the past year, all of the support of this project has come from the Cen- 
tral Intelligence Agency, and the work outlined above has been continued and 
extended to other languages. 


CAMBRIDGE LANGUAGE RESEARCH UNIT 


Summary of plan of research 


The unit proposed to investigate the possibility of using a specially con- 
structed mechanized thesaurus in the production of idiomatic translations by 
machine. To this end they planned to study the application of logic and other 
branches of mathematics to syntactic analysis; to extend descriptive linguistic 
analysis to give the cross-relations between passages in a language and transla- 
tions of them into another language; and to construct comprehensive, ready- 
to-use mechanical dictionaries and programs for machine translation. 


Results achieved 


Much of the study of this group has been devoted to the semantic aspects of 
natural language and how to deal with them. A careful study of existing 
thesauri has been carried out and has served as a starting point for various 
experimental thesaurus-like word classification schemes which indicate the ways 
in which words are semantically related to each other. These classification 
schemes have the same form as mathematical partially ordered systems, and the 
unit is attempting to show that they can be so modified as to form more special- 
ized mathematical systems known as lattices. The group believes that word 
schemes in lattice form will be a useful tool for natural language processing, 
including mechanical translation and abstracting, and information retrieval. 
As an example of the last type of application, a retrieval system for several 
hundred books has been worked out and is being expanded. Work on one 
particular mechanical translation scheme from Italian to English is well ad- 
vanced, and other work, including construction of translation procedures based 
on syntactic categories, is being carried on simultaneously. 
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HARVARD UNIVERSITY 


Summary of plan of research 

It was proposed to extend the preliminary research on the structure of the 
Russian and English languages being carried out at the Harvard Computation 
Laboratory, in the light of the conviction that the processes of translation were 
not well enough defined to justify construction of any complete translation sys- 
tem. The initial effort was to be devoted to the formulation of efficient tech- 
niques for the compilation and maintenance of an automatic dictionary in order 
to provide an experimental tool to facilitate research still needed to develop 
methods for high-quality translation and a system for automatic word-by-word 
translation. Continued research into methods for achieving faithful, smooth 
translation from Russian to English was planned. 


Results achieved 

During the first 2 years of research, computer programs for the Univac were 
written which permit the operation of an automatic dictionary containing about 
15,000 Russian words. The programs permit the recognition of any of the 
15,000 words in any one of their forms, making it possible to process over 
150,000 distinct Russian word forms. This automatic dictionary has been used 
to produce word-for-word translations of scientific Russian texts, which are not 
true translations since they fail to take account of the grammar, but which have 
proved useful for some purposes in lieu of actual translations. The techniques 
and procedures which have been developed are applicable to the whole field of 
compilation and operation of automatic dictionaries, 

The automatic dictionary has also been used as a tool both to compile language 
statistics and to conduct research on syntax and snytactic analysis. Recently 


developed programs for syntactic analysis are based on the work of Mrs. Ida: 


Rhodes of the National Bureau of Standards. Programs are now in operation 
which provide a partial syntactic analysis of Russian sentences on an experi- 


mental basis. 
UNIVERSITY OF CALIFORNIA, BERKELEY 


Summary of plan of research 

The purposes of the project were (1) to analyze a large amount of scientific 
Russian text in order to provide the information necessary for the preparation 
of a mechanical translation program ; and (2) to write and test such a program. 
To minimize the size and complexity of the vocabulary, it was decided to restrict 
the scope of research initially to one area of science, but to design the translation 
system in such a way that it can be readily adapted to other fields. The more 
specific research tasks include compilation of an automatic dictionary and 
programs for its use, development of a system for the automatic parsing of 
sentences, and development of a mechanized system for analyzing Russian text 
and compiling data about the language as it is used in current scientific publica- 


tions. The major part of the analysis is devoted to the solution of the “multiple: 


meaning” problem, which will require the analysis of several hundred thousand 
running words of text. Programing of the translation mechanism will proceed 
hand-in-hand with the linguistic analysis, and the results of the latter will be 
incorporated into the program as they become available. 


Results achieved 


As a result of discussions with representatives of the Central Intelligence- 
Agency and the National Science Foundation, the group decided to concentrate: 


on the field of biochemistry, rather than nuclear physics, as originally suggested, 
since there already was a group studying nuclear physics. 

Systems and research tools which have been produced thus far include (a) 
a maximally effective segmentation system for splitting Russian words into 
component parts, (0) a coding system for Russian grammar, (c) a Russian-to- 
English dictionary with a vocabulary coverage of over 300,000 words, (d) an 
automatic dictionary system which can accomplish look-up and segmentation at 
a rate of 7,500 words per minute when used on an IBM 704 computer, (e) a 
system for analyzing Russian text, (f) an exhaustive analysis for 30,000 words 
of text, (g) a linguistic data gathering program for obtaining information from 
analyzed text by means of an IBM 704, (h) a system for coding Russian scientific 
text for input, (i) a catalog of situations in which changes in order of words 


are required when translating from Russian to English, and (j) a method for: 


automatic parsing of Russian text. 
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Mr. Fuuron. The question of translation, of course, by machine I 
think first came up in Popular Mechanics magazine in the early 1900’s. 
It isn’t a new question. They thought it might be something like a 
telephone you talked into, and then you would have some sort of an 
instrument like our dial. Do you recall that early article? 

Dr. WATERMAN. I would just say these Popular Mechanics ideas are 
very wide and they are likely to hit on something of this sort, of course, 
just in thinking of possibilities that lie ahead. They also make lots 
of comments that don’t ever materialize. But they do hit the nail 
on the head in many cases. 

Mr. Fuuron. The remarkable thing is that where the idea was out 
of the blue, it has now materialized. I will put in the record the item 
from Popular Mechanics; they did have this kind of a thing. 


{From Popular Mechanics magazine, January 1958, p. 142. This is a reproduction of an 
item published in the February 1910 issue of Popular Mechanics. ] 


AN INTERPRETING TELEPHONE 


The reader must not be led to think by the above head that any such wonder- 
ful device has yet been invented, or that there is anything but a very distant 
likelihood of such a device in the future. Yet, every great invention has ap- 
peared impossible to the many and difficult to most before its arrival, and very 
simple afterward. 

The interpreting telephone, however, is nothing more or less than the prophecy 
of a Michigan inventor, who believes the time will come when a person at one 
end of the line can speak one language and that any language can be heard at 
the other end. That is, any language the telephone is provided with. The draw- 
ing shows the Michigan man’s idea of how it is to be accomplished. Attached to 
the side of an ordinary telephone receiver is a box in which the different lan- 
guages are stored. If the man answering the phone is Chinese, he will move 
the indicator opposite the Chinese language, and then some mysterious mechanism 
yet to be designed will receive the English words coming over the wire and 
convert them into Chinese. 

Mr. Fuuron. I think on page 4 of Dr. Waterman’s statement it 
is pointed out the reason why the National Science Foundation is in 
this field. You are complementing the work of other Federal agencies 
by emphasizing the supporting projects and longer range, more basic 
research, is that not correct ? 

Dr. WarermMan. Yes. Really our job as we see it, is to look the field 
over carefully, and where the work is going on efficiently, then we 
don’t need to get into it. On the other hand, we can fill the gaps we 
may find, and in general our role concerns this longer range, more 
precise work which is less likely to be accomplished by other agencies 
with more pressing needs. 

Mr. Fuxiron. On page 6, then, at the end of your statement, Dr. 
Waterman, you state: “We are convinced that the promising results 
obtained thus far warrant the continuation of support of research on 
mechanical translation by the National Science Foundation and other 
Federal agencies.” 

Do you feel firmly convinced of that ? 

Dr. Warerman. I think there is no question of that, Mr. Fulton, 
because the progress made today shows two things: 

One is that it is amazing how much has been done. 

And the second is, how difficult it will be to do a really finished job, 
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Mr. Futon. How does Dr. Adkinson feel on that important point? 

Dr. Apxrnson. I feel the same way, sir. I feel it is an area which 
we need to pursue vigorously. There is a great demand for rapid 
translation, and accurate translation. There is a great demand by 
some agencies for an indication of what is in the material. And 
machine translation shows promise to date of being able to fulfill this 
need much faster than the see translator, but we have many prob- 
lems yet to solve. Therefore, we need to continue to support the 
research. 

Mr. Fuuron. How do you get current coordination and correla- 
tion among the people in various agencies in the Government work- 
ing in this field ? 

Dr. Apkrnson. As we stated, we let the other people know whenever 
we have proposals for research in this field, and we get their com- 
ments and they are aware of what we are doing and considering and 
they keep us informed of their intentions of supporting research in 
this field. Then there has been this group of the military and the 
intelligence agencies, and then our informal meetings of Federal 
sponsors of research plus the formal group we set up in 1960, plus 
meetings of the administrators of Federal research programs, at long 
intervals to discuss all the problems with the directors of the research 
projects; plus the meetings of research workers that we have men- 
tioned here—the one in February 1960 and the one we are going to 
have this summer, which enable the research people who are actuall 
doing the research to discuss their findings, exchange ideas; and all 
this I think brings about pretty effective cooperation. 

Mr. Futron. So you do really have a joint and a cooperative pro- 
gram for research and development in this particular field ? 

Dr. Apxrnson. Right. 

Dr. WATERMAN. Yes, sir. 

Mr. Futrron. I am glad to have both of you answer “Yes” on that. 

There has been a comment in Dr, Waterman’s statement on page 6 
of the semiannual report entitled “Current Research and Develop- 
ment of Scientific Documentation,” which you mentioned, Will you 
please give us a short résumé for the record of what that report con- 
tains, and likewise give us reference, by citation, as well as location 
of other reports of other agencies of Government on this same basis. 

I think we should also have in the record, as a research matter, the 
reports on that of any other Government or governmental agency 
abroad, so we have the bibliography on it. 

(The information requested is as follows :) 

Since 1957, the National Science Foundation has published a semiannual 
report entitled “Current Research and Development in Scientific Documenta- 
tion.” Its purpose is to enable each research and development group in this 
relatively new field of scientific documentation (the processing and com- 
munication of scientific information) to know what other groups are currently 
working on and what publications and reports are available. The latest in this 
series of reports, No. 6, is now being printed and is expected to be ready for 
distribution by May 20, 1960. A copy will be submitted to the subcommittee 
as soon as possible. The report describes all current research and develop- 
ment projects and studies known to the Foundation in the fields of informa- 
tion requirements of users, information storage and retrieval, mechanical trans- 
lation, and development of special-purpose equipment for handling scientific 
information. In report No. 6, the “mechanical translation” section contains 
descriptions of 17 research projects—11 in the United States and 6 in Europe, 
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Israel, and Japan. The report also lists the publications and reports that 
describe the work of each group. No information is included about current 
activities in the Soviet Union, because we have been unable to obtain informa- 
tion other than published accounts of the work that has been done; these Soviet 
papers have been translated and distributed to interested organizations and 
individuals. 

To our knowledge, no other Federal agencies are issuing any comparable 
reports that describe current research and development projects in this field: 
and we know of no similar reports by foreign governmental agencies. The sub- 
committee may be interested, however, in two other U.S. Government reports, 
which have surveyed the work that has been done in the field of mechanical trans- 
lation (a copy of each is being submitted for the subcommittee’s files) : 

“‘Report on the State of Machine Translation in the United States and Great 
Britain,’ by Y. Bar-Hillel, dated February 15, 1959. This survey was prepared 
for the U.S. Office of Naval Research, Information Systems Branch, Contract 
No. NONR-2578 (00), NR 049-130. 

“‘Survey of the Field of Mechanical Translation of Languages,’ by George 
W. Reitwiesner and Martin H. Weik, dated May 1958. This is memorandum 
Report No. 1147 of the Ballistic Research Laboratories, Aberdeen Proving 
Ground, Md. The survey was conducted at the request of the Office of Ordance 
Research, Department of the Army.” 


Mr. Fuuron. Might I then refer to something else in this language 
research? Then lam through. 

The CHarrmMan. Will the gentleman yield there ? 

We are not going to be able to finish by the time the Congress 
takes in, which is about 2 minutes off. We have Mr, Richard See, of 
the documentation research program of the National Science Founda- 
tion: Dr. Victor H. Yngve of the Research Laboratory of Electronics 
at Massachusetts Institute of Technology; and Dr. Anthony G. Oet- 
tinger, Computation Laboratory of Harvard University. 

Mr. Fuuron. I just have one more point I want to make, and I am 
through. 

The CHarrmMan. Well, before we go further, though, I think this: 
that we ought to meet, if we can, this afternoon about 2:30 to try to 
hear these out-of-town witnesses, especially; and then have ample 
time in which to question them, too. 

All right, Mr. Fulton. 

Mr. Futon. I will be glad to meet then. 

I noticed in Dr. Adkinson’s statement he used the word “methodol- 
ogy.” What is the difference in definition between the word “method” 
and “methodology”? 

Dr. Avkinson. Probably none, sir. 

Mr. Fuxron. Now, then, on your statement, too, I went to colle, 
and learned language. I sometimes wonder about what we are speak- 
ing. In Mr. Adkinson’s statement on page 1 it says: “Computers have 
a means of carrying out statements of natural languages in order to 
learn more about language itself,” which is a distinction between 
natural language and language. On page 4 it says: “Processing nat- 
ural language by machine for a wide variety of purposes.” Again, on 
page 4, it says: “In this larger context of the automatic processing of 
natural language before its significance can be fully understood.” 
And on page 5: “A greater understanding attained concerning lan- 
guage in general.” 

And then on page 6 it says: 


A group of researchers at the University of Pennsylvania is studying the 
Problem with respect to English, which is another language. 
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And then on page 6, further down, it says— 


must take into account the progress in the larger field of the automatic process. 
ing of natural language. 

I would ask, Dr. Adkinson, in relation to the statement that had 
been given earlier by Dr. Waterman, what was Dr. Waterman speak- 
ing: Was he speaking English; was he speaking natural language, was 
he speaking language, or was he speaking—— 

The Cuarrman. American. 

Mr. Futron. I have gone through various other elements, but what 
was it he was speaking $ 

Dr. Apxkrnson. He was speaking a natural language in English. 
Some people call it American English. 

Mr. Furron. When we get all these very fine distinctions made, 
they have no real basis in practical use. Maybe we are going into a 
field where on trying to get some of these distinctions worked out 
we are losing the main course and purpose of what the research and 
development is. I have a serious purpose in pointing out I would like 
to make sure we aren’t just going down blind ends and blind avenues 
and blind alleys on minor distinctions that in the long course of the 
research and development mean nothing. 

You see, if we are going to build up a whole lot of scientific distine- 
tions here, and then try to maneuver to see where we are, we are 
making tremendous difficulties in this field. Are we doing that in 
research and development ? 

Dr. Apxrnson. I do not believe we are, sir. In using the words 
“natural language”, I had fallen into a usage that has been used in 
this field, where we distinguish 

Mr. Methodology 

Dr. Apxtnson. Where we are distinguishing between the natural 
language you and I use, English, if we speak English, and language 
for the machine, a distinction we make when we are talking about 
language in the mechanical translation field. 

Mr. Futron. Iam trying to pin you down on this. 

Dr. Apxinson. Language has to be formalized so the machine can 
understand it. The machine has to have instructions that it can 
understand, and it does not understand the language as we speak it. 

Dr. Waterman. There was an article to illustrate this point in 
‘the Manchester Guardian, where two love poems were written by a 
machine. Well, they were anything but poetry. However, they said 
the great merit of this was at least they looked like love poems, and 
the machine could write 200 a minute, if anyone wanted them, all 
different. That is nota natural language. 

Mr. Funron. Of what did you think you were speaking in your 
statement ? 

Dr. Waterman. English, which is natural to me. 

The Cuamman. The subcommittee will adjourn until 2:30. We 
want to give everybody ample opportunity. 

(Whereupon, at 12:05 p.m., a recess was taken to 2 :30 p.m.) 
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AFTERNOON SESSION 


The Cuatrman. The subcommittee will come to order. 

This morning we had the pleasure of hearing from Dr. Waterman 
and Dr. Adkinson. 

Dr. Adkinson, I would like to ask you a question or two in reference 
to one feature of this program. 

Once these machines are developed, should one new agency or sub- 
organization be assigned the national job of translating all docu- 
ments in one center? Should the center have suborganizations for 
we'd say agriculture, economics, politics, political science, physical 
science, engineering, medicine, intelligence, and so forth? What is 
your view in that respect ? 

Dr. Apkrnson. It’s my view that we should proceed the same as 
we are doing with human translators; that there will be agreement 
reached as to the responsibilities among not only the Government 
agencies but also among private organizations interested in trans- 
lating with machines if they become useful. I think we will have to 
come to an agreement on responsibilities. But I wouldn’t envision 
that at first with the machines we would translate everything, because 
the scientists and the engineers and the administrators are having 
enough trouble now reading. I hope they will translate the impor- 
tant things. This will vary, depending on whether you are an intelli- 
gence agency; whether you are a military agency without intelligence 
responsibility, say with research and development responsibility ; 
whether you are the Department of Agriculture, or whether you 
are the Department of the Interior. So I think there will have to be 
an agreement on areas the same as there is agreement on areas today 
and a central file on what is being translated and what has been 
translated. 

The Cuairman. Well, tell me, is that machine very expensive? 

Dr. Apkinson. It depends on the machines used. Most of the com- 
puters are expensive. 

The Cuairman. When you say they are expensive, what do you 
have reference to? Just in general figures. 

Dr. Apkinson. Oh, $1 million or $2 million. 

The CuHatrRMAN. $1 million or $2 million for a machine? 

Dr. Apkinson. Yes, but I was not thinking necessarily of a machine, 
for instance, in one agency, just for translating. They may use it 
for other purposes, and only use it part of the time for translation 
purposes. 

The Carman. What other purposes would the machines be used 
for? 

Dr. Apkinson. Computing—these machines are computers, and 
they are used now for that purpose. They are used for inventory 
purposes, manpower control, and other aspects of information han- 
dling as well astranslation. There are many uses. 

The CuarrmMan. The machine would be useful for other purposes? 

Dr. Apkrnson. That is right. 
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The Cuamman. Manpower—what do you mean by manpower? 

Dr. Apk1nson. Personnel control records. 

The Cuarrman. Oh, yes, keeping records generally. 

Dr. Apxinson. That is right. 

The Cuarman. In any Government office? 

Dr. Apxrnson. In other words, these machines to date are not 
unique; they are computers that can be used for many different 
purposes. 

The CHatrman. What is the unique feature about the translation? 

Dr. Apxrnson. The unique feature about the translation purpose is 
the understanding of the language that is required, and the program 
for using the computer to prepare translations, the understanding of 
how to instruct the computer to handle the language that is to be trans- 
lated and transferred into the language that you want to read. 

The CHatrmMan. You would have, then, computers in a central 
agency like, we'd say, the National Science Foundation, and then you 
would have computers in other agencies where there is need for trans- 
lation or keeping of tabulated office records such as you indicated? 

Dr. Apkrnson. Yes. 

The Cuatrrman. Then as far as the translations are concerned, you 
would have copies of the translations sent to the other agencies 
through the central agency so that they would have widespread dis- 
tribution, is that right ? 

Dr. Apk1nson. I wasn’t thinking of a central agency, because there 
are at the present time computers available in the Government, and 
they are being used; also there are computers in universities, and they 
are being used. 

The CHatrman. Can those computers that are presently available 
be used for translation ? 

Dr. Anxkrnson. Some of them could be, yes. 

The Cratrman. They will have to be readapted though, won’t they? 

Dr. Apkrnson. No; not as far as I understand. ‘There may be some 
developments so that there would have to be some additions. If this 
comes up then we will have to take another look at the situation; for 
example it might develop that larger memories will have to be put 
onto the conventional computer, or a new computer built with a larger 
memory ; then we would have a reappraisal of the situation. 

The Cuatrman. Would you have the National Science Foundation, 
however, still retain a sort of directory influence or a guiding influence, 
in reference to the translation efforts ? 

Dr. Apkinson. We would assume that we had a responsibility to 
get the people together to discuss the problem and come to some mutual 
agreement that everybody would want to work with, the same as we 
have on the support of research on mechanical translation. 

The Cuarrman. Mr. Fulton, you were questioning the witness when 
we rather abruptly stopped today at noon. 

Mr. Furron. I can take mine later. 

The Cramman. We would rather finish yours. I had already 
promised to recognize Mr. King before Mr. Bass arrived. 

Mr. Futon. I had referred to the Popular Mechanics magazine, 
and that had been in relation to the start of a translating or interpret- 
ing machine. It was in the 1910 issue of the Popular Mechanics maga- 
zine which printed the prediction of a Michigan inventor. He said a 
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telephone would one day convert a conversation from one language to 
another, with some “mysterious mechanism yet to be designed.” The 
editor of that day called it a very distant likelihood. It was published 
in the February issue of 1910 with a picture of a telephone remarkably 
like our present dial system. 

The Cuarrman. Are you sure it wasn’t a Pennsylvania Man? 

{ Laughter. ] 

Mr. Furron. May I just put the article in the record related to the 
point where we had discussed it? If this is a give and a take, I would 
like to point out something in the Mutual Security Act of last year. 

I had an amendment which was to assist in the use of foreign cur- 
rencies counterpart funds on the translations of documents, treatises, 
and monographs, of a scientific and research nature, and was likewise 
to aid in the dissemination of scientfic and research material. 

Have you people thought of using any of the tremendous amounts 
of foreign currency that the U.S. Government has on this kind of a 
project? Because we have it in countries that now are evidently 
taking an interest in this program, and we have quite a few scientists 
that certainly could be helped with local currency in the countries 
where they are if we had a cooperative type international program 
on this. That is under Public Law 480, as well as under the ICA 
program. 

Dr. Apkryson. Yes. Mr. Fulton, we received, you know, in 1958, 
I believe, or 1959, an appropriation authorization to the President for 
$1,200,000 for the purpose of translating materials with funds in cer- 
iain foreign countries. As of today we have three projects going. 
One in Israel, where they are translating several thousand pages, I 
think it is in the neighborhood of 20,000 pages, under contract, from 
Russian to English. We have another one in Poland, where they are 
under contract to translate around 17,000 pages from Polish to 
English, of scientific work, and a third one that is just getting under- 
way in Yugoslavia, to do one or two of their languages, languages 
of central Europe, where there is good scientific information. 

These projects are developed. These funds will run out this coming 
year. We have requested additional funds and the House did not see 
fit—the House Appropriations Committee, and the House did not see 
fit to include that in our appropriation. 

Mr. Futtron. Could I comment there ? 

My method of doing it was that the funds would not be appro- 
priated to you in dollars as part of your appropriation, but through 
the appropriations subcommittee that we have been working with on 
independent agencies, they would have a set-aside for you. If you 
will let me here, I will certainly take it up with them, because I was the 
fellow that helped set up the system doing this in addition to what you 
are doing. 

But the point I’m making today is: Why not an extension into this 
field over and above the programs you and I both know of which are 
the regular translation programs? Why not move that sort of thing 
over into this field, too, that we are speaking of today ? 

Dr. Apkrnson. Our request to the House was for funds over and 
above our regular funds for the programs that have been going on in 
translation. 

Mr. Futron. Did you ask for that in dollars? 
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Dr. Apkrnson. We are required to ask for them in dollars under 
section 104(k) of the Mutual Security Act, which requires we make 
the request in dollars. 

Mr. Futron. But in my particular amendment I think you will find 
this is the first time they can be appropriated to you in kind, in foreign 
currency, rather than in dollars, and that is why I have always been 
pointing it out; you always get stuck on your budget when you ask for 
it in dollars. So I think it has been a misdirection, and if you would 
ask for it in funds, and ask some of us—I am particularly interested— 
we will certainly help you get it. 

Mr. Rurrensere. I think, Mr. Fulton, if I may speak to that, 
under present law, as Dr. Adkinson mentioned, section 104(k) of 
the Agricultural Trade Development and Assistance Act of 1954, 
requires a specific appropriation of the amounts used under it. 

Mr. Furron. Yes, but I have spoken to you also under the Inter- 
national Cooperation Administration, the Foreign Aid Act, which 
is mutual security, and under that, because of my amendment ex- 
tending it, you can get it notwithstanding the fact that Public Law 
480, the Agricultural Assistance Act, requires it in dollars. I took 
that provision and put it over under the Mutual Security Act and 
thought I had worked out a method of using the foreign funds in 
kind as a part of foreign aid. You see that is a much different ap- 
proach; I had taken that particular subsection (k), and had changed 
it by amendment. In fact, I should give credit to Mr. Humphrey, 
too, even in his present position of becoming a good Senator again, 
[ Laughter. ] 

Dr. Apkinson. Mr. Fulton, we certainly will look into this, and 
explore this. 

Mr. Furron. You will have to check with both Senator Humphrey’s 
office and mine. 

Dr. Apxrnson. I will do that. 

Mr. Futon. Because of the combination of the language both the 
House put it in—the Foreign Affairs Committee level—and Mr. 
Humphrey made some changes and agreed to it. 

There are tremendous funds that can be used. I’m just sorry to 
see them lie there without being used when they could be put to good 

urposes like this and much further than on a dollar basis. That 
is all. 

The Cuarmman. Mr. King? 

Mr. Kine. Dr. Adkinson, I listened to your testimony this morn- 
ing with great interest. I was intrigued but a little puzzled and 
maybe even a little alarmed over some of the implications of the 
idea of making translations from foreign languages by means of a 
mechanical implement. 

I would like to ask a question or two and have you discuss it for 
the record. 

This is the thing that is disturbing me: I have had a little experi- 
ence in foreign languages myself, principally in French. I’ve hada 
little experience in translating—not a lot, and not professionally, 
but at least I know the problem. When you start out in the language, 
or if you had very little experience in a foreign language, you as- 
sume that there are equivalents for an English term and then you 
seek the French or the Russian equivalent. You go to a dictionary 
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and sometimes the dictionary provides you with what purportedly 
is an equivalent. Then you use it and find out to your dismay later 
that that is an equivalent only in one situation but not an equivalent in 
another situation, and that the current usage of that language in 
fact forbids the use of that term in that particular context. Lan- 
guages, really they are a living thing, like a flower, they reflect the 
culture, the thinking, the religion, the history, the habits, the pat- 
terns of the people “that produced that language, and sometimes it 
is almost impossible to work out exact equivalents. A skilled trans- 
lator realizes that instead of trying to find a word-for-word equiva- 
lent, what he has to do is to back up and approach the sentence from 
an entirely different point of view, trying to convey the same emo- 
tional and intellectual impact, but’ perhaps using entirely different 
words, because of the different emotional impact that those words 
happen to have. Words have connotations as well as denotations, 

For example, the English word “boy,” in French, ‘ ‘oarcon,” most 
of the time they are equivalents, but not always. If you shouted at 
an adult man “boy” he is a little bit insulted, you are degrading 
him, but in French it is perfectly acceptable under Some circumstances, 

Dr. Apxinson. Right. 

Mr. Kine. Then the French have a different approach to certain 

roblems. When they are talking in superlatives, they use the sub- 
junctive mood because that is the mood which denotes potentiality 
rather than fulfillment. In English, or American English, we have 
very little use for the subjunctive. We sort of blunder through and 
don’t recognize the distinction between the potentiality and the 
actuality. 

So you have an entirely different philosophic approach in these two 
languages. 

Now, my question is, that in a mechanical translation, aren’t you 
sacrificing all of the subtleties, all of these qualities that make a lan- 
guage a living thing rather than a dead thing, and aren’t you coming 
up with something that is sort of a linguistic monstrosity that perhaps 
technically is accurate or correct, but in which all of the life has gone 
out? That disturbs me. I see in our society a tendency to try to put 
everybody into classifications. We classify people according to 
whether they are big or little or fat or lean or what have you, bearded 
or clean shaven, but people cannot be classified that simply and 
neither can thoughts and neither can language. And I am just 
wondering what this would lead to. 

T just raise that as a question. 

Dr. Apkrnson. Mr. King, you are getting into a part of this where 
we should perhaps call on experts—and I am not an expert in lin- 
guistics or in translation. But I can give you my viewpoint on it, 
from the discussions I have heard. I can call on Mr. See in my office 
who knows more about how the problems of handling this aspect of the 
language might be solved. I can assure you, however, that every one 
wor rking: i in this field is worrying about this problem that you men- 
tioned and explained so well just now. 

In other words, how do we keep the meaning? A word-for-word 
translation may work, and again, as you say, it may not bring out the 
meaning at all of the author. This is the problem that all the trans- 
lating projects are well aware of—semantics, idiomatic expressions, 
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and other aspects of the living language. These things they are 
struggling with, and this is a real problem when you try to tell a 
machine how to handle this aspect of language. It isa real problem. 

Mr. Kine. It is a very tremendous problem as I see it. 

Dr. Apxrnson. That is right. 

Mr. Kine. One other aspect I didn’t mention, languages change, 
right before our very eyes they change. Fifteen years ago if I said 
you had an atomic brain, that would be an insult. The atom is the 
next to the smallest thing known to man. 

Now, if I say you have an atomic brain, I’m complimenting you, 
I’m suggesting that your brain has all the properties of an atom 
bomb, it is explosive. 

Dr. Apxtnson. That is right. 


Mr. Kina. Well, the word “atomic” has acquired a completely- 


different connotation just in the last 15 years. We have seen it almost 
literally before our very eyes completely change its meaning, at least 
its connotation—not its technical meaning but the current meaning, 

Dr. Apkr1nson. Right. 

Mr. Kring. You get this machine all set up in 1 year and in 10 
years some of your concepts have just slipped out from under your 
feet and you have got to revise them. 

Dr. Apxinson. In their translation procedures and in their diction- 
aries they would not be fixing the dictionary so that they could not 
change the words or add additional meanings or connotations to a 
word. Changes could be made. This is something they are working 
toward. In other words, we have to recognize that a language is 
living, that new words are coined, that words are used with different 
connotations. 

This is one of the problems that the mechanical] translation groups 
have to worry about all the time, especially in the sciences, for we are 
moving ahead so fast. In the Russian language, for instance, they 
have coined a lot of words, they have adopted a lot of words from 
English and other languages, and this means that we have a constant 
problem here. 

All I can say is I can assure you they are aware of this problem, 
they are struggling with it, they are trying to do the best they can 
with it, but it is a very difficult problem. 

If you would like a further explanation on the technical side I can 
ask Mr. See to come up and explain some of the things they are doing 
along this line. 

The CHarrman. We have statements from Mr. See a little later, 
don’t we? 

Dr. Apxkrnson. He is here to answer questions, Mr. Chairman. But 
I am sure some of the other members who are going to testify will 
speak to this problem, Mr. King, as they testify. 

Mr. Kina. I am interested in this aspect of it. I won’t pursue it 
now, Dr. Adkinson, since you have deferred to others, but as the 
others come up I would appreciate their dealing to some extent with 
that problem. 

Dr. Apxrnson. I am sure they will. 

Mr. Kine. Because it is one of interest. 

The Mr. Bass. 

Mr. Bass. Mr. Chairman, we have two other witnesses. 
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The CuarrMan. Yes, we have Dr. Yngve and Dr. Oettinger. 

Mr. Bass. Mr. Chairman, in the interest of getting on with this 
hearing, I will forgo —_ questions, although I would like to explore 
some angles with Dr. Adkinson. 

Dr. Apxinson. Thank you. 

The Cuarrman. Why not do this—the gentleman is very generous— 
why not hear from the two doctors, we have one from MIT and one 
from Harvard, and then in such time as we have left we can explore 
further questions. 

Dr. Apxinson. Thank you. 

The CuarMan. If there is no objection, Doctor. 

Dr. Apxinson. That is fine. 

Mr. Fuuron. In the interest of getting along with the witnesses, 
can we have one of the staff translate Louisiana into English? 
[Laughter. | 

The CHaimman. Well, Dr. Victor H. Yngve and Dr. Anthony G. 
Oettinger, will you gentlemen come up ? 

Dr. Victor H. Yngve, from the Massachusetts Institute of Tech- 
nology. 

Doctor, you have a prepared statement here and we would be pleased 
to have you present your statement in the interest of saving time. 


STATEMENT OF DR. VICTOR H. YNGVE,? MASSACHUSETTS 
INSTITUTE OF TECHNOLOGY 


Dr. Yneve. Thank you, Mr. Chairman. 

Mr. Chairman and members of the committee, I appreciate the 
opportunity to be the first scientific investigator on your list of wit- 
nesses on the topic of mechanical translation. Since this position has 
fallen to my lot, I feel it would be most helpful if I reviewed for you 
the field as a whole and the extent of progress as I see it, as well as the 
way in which our own work fits into this broader picture. I shall 
proceed by trying to answer some questions that are frequently raised. 

People often ask why we are interested in the eventual possibility 
of translating from one language to another by the aid of machines. 
It has, of course, great intrinsic scientific interest. Since a sizable 
portion of the funds being expended today in this area comes from 
military budgets, I assume that some people feel that there would be 
extensive military applications. Without appearing to contradict this 
view, and without denying the possibility of other aims within the 
military, I should like to state that in my view the eventual develop- 
ment of mechanical translation will have a far greater impact in its 
peacetime applications than it will have in military applications. May 
Texpand on this briefly ? 

The world is divided by language barriers into about 4,000 linguistic 
communities. Many of these communities are small and represent 
primitive or underdeveloped cultures. But well over 50 of these lan- 
guage communities are large and important enough to carry on exten- 
sive trade, communication, and cultural interchange one with another. 


* Victor H. Yngve: Born July 5, 1920, Niagara Falls, N.Y.: married, 8 children. 

B.S., physies, Antioch College, 1943; U.S. Army Medical Department, 1943-46; S.M., 
Physies, University of Chicago, 1950; Ph. D., physics, University of Chicago, 1953. 

Mechanical translation research, MIT, 1953 to date. 

Coeditor of journal “Mechanical Translation,” published at MIT, now in sixth year. 

Director, Mechanical Translation Group, MIT. 
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All interchange between language communities must now funnel 
through individuals who are to some extent bilingual. The resulting 
bottlenecks serve to stifle such intercourse and to keep the language 
communities in comparative isolation, It has been suggested that the 
adoption of a single universal second language—either a natural lan- 
guage or an artificial one—would offer a solution. I agree that this 
would be ideal. The various language areas could continue to main- 
tain their linguistic integrity at the small expense of having to learn 
only one second language in order to communicate with the rest of the 
world. ‘There have been hundreds of attempts to introduce such a 
universal second language, and they have all failed because it has 
been impossible to get enough people to agree on the idea or on the 
choice of a language. So we are left with the necessity for a great deal 
of translation in order to conduct the day-to-day business of normal 
peacetime activities. In this area, which includes scientific and tech- 
nical communications as well as other kinds, the eventual possibility 
of mechanical translation could be a great boon. 

Mechanical translation would involve the use of the automatic digi- 
tal computer—either one of the existing general-purpose machines or 
eventually, when we know more about it, a special-purpose machine, 
The machine operations involved are allied to other types of informa- 
tion processing by machine. They include access to dictionaries and 
tables stored in a large machine memory, and appropriate automatic 
manipulation and processing of words and items. The only trouble is 
that we don’t yet know how to do it. 

There has arisen in the past 10 or 12 years a small band of pioneers 
dedicated to finding out how to instruct a machine to produce satisfac- 
tory translations. Support has been adequate. I know of no one with 
a sensible research proposal ever being for long without adequate sup- 
port. But at the same time, I feel that the supporting agencies have 
been sufficiently careful in screening applications so that there has been 
very little if any waste of Government funds on ill-conceived projects. 

ork in mechanical translation can be separated into three parts: 
Science, technology, and production. Under science we have research 
directed toward the discovery of the basic facts and knowledge of 
languages and translation that will form a firm foundation for erecting 
a technology. Under technology we can include research leading to 
the development of the dictionaries and machines that our science tells 
us how to build. Under production we would of course contemplate 
actual use of the technology for the production of useful translations. 

Taking these three areas of endeavor in reverse order, I would like 
to say a few things about each. First, on production. I know of no 
system of mechanical translation either existing or proposed, that 
would be capable of yielding adequate translations in the next few 
years. By adequate translations, I mean translations competitive with 
those made by humans. There are, of course, a number of systems un- 
der development. They all have serious defects, as their proprietors 
would be the first to admit. The Government should be exceedingly 
cautious in assessing any scheme of mechanical translation that is al- 
leged to be ready for production. The reason is that it is very difficult 
to assess accurately the merit of a less-than-perfect translation system. 
May I repeat, I know of no system that would be capable of yielding 
adequate translations in the next few years, 
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Second, in the area of development, there are a number of groups 
throughout the country working very hard. A lot of good work is be- 
ing done. ‘These groups are building a technology. They are putting 
dictionary information into computers; they are building machines. 
This nace isimportant. They are experimenting with sentence anal- 
ysis and playing with semantics. Their motto is “Let’s do now what 
we know how to do. Maybe someone will be able to use it.” Maybe. 
But I suggest extreme caution because the results at present are not 

ood enough. If someone wants to use one of these systems, let him do 
it with his eyes open, counting the costs and counting the error rate. 

Prospective customers are likely to say that a less-than-perfect 
product will be satisfactory; perhaps a product with 90 or 95 percent 
accuracy. We hear such figures. Any quoted percent accuracy 
means very little, however, because of the difficulty of assessing less- 
than-perfect output. The trouble is that even if errors could be 
counted, it is difficult to determine the relative amount of loss caused 
by different kinds of errors. Some errors are not serious, others are 
very serious. But if I were pressed for a figure I would say that 
realistically we can’t reach an accuracy of 50 percent at present. But 
even if we could achieve 95 percent accuracy what would it mean? 
Would it mean that we would miss the 5 percent of important new 
material and get the 95 percent of already known material? There 
is some indication that this would be the case. 

I don’t want to be misunderstood. I am not advocating perfection 
where perfection is perhaps unattainable. But would we tolerate a 
mechanized bank accounting system that guaranteed 95-percent ac- 
curacy in crediting and debiting the accounts? Would we tolerate 
voting machines that guaranteed a 95-percent accuracy in totaling the 
vote?’ Would we tolerate a mechanized post office that would guaran- 
tee correct delivery of 95 percent of the letters? The 5 percent of 
error might be caused by the failure of simple rules of thumb for 
treating the multiple meaning of words like “Washington.” Such a 
rule of thumb might be: “Send a letter with ‘Washington’ in the ad- 
dress to Washington, D.C. if the letter is mailed east of the Mississippi, 
and to the State of Washington if mailed west of the Mississippi, and 
ignore such infrequent meanings as Washington, Ga.; Washington, 
Ill.; Washington, Ind.; Washington, Lowa; Washington, Mo.; Wash- 
ington, N.J.; Washington, N.C.; and Washington, Pa., since the error 
rate will scarcely be affected.” It is such rules for dealing with the 
multiple meaning of words that lead to unreliable translations. If 
someone can use such a system, I will not object, but let him look it 
over carefully first. 

The groups that are developing systems are performing a vital 
service. They are building the technology that we will surely need, 
but it must be on as firm a scientific foundation as can be found. 
The trouble is that there is no foundation at all in certain places. 
This brings us to the third area of mechanical translation research, 
science. The group at MIT and some of the other groups are working 
very hard trying to build an adequate foundation. We are working 
on the science of communication. Our motto is “Let us isolate those 
ireas where our routines are imperfect for lack of basic knowledge, 
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and go after that basic knowledge.” We are seeking to fill in the 
gaps and to expand and deepen our understanding. We are seeking 
routines that are intrinsically capable of producing correct and ac- 
curate translations. This will be a long, difficult process, but a neces- 
sary one if satisfactory mechanical translation is ever to be a reality, 

At MIT, research on mechanical translation is conceived of as just 
one facet in a rapidly emerging area of study—the communication 
sciences. Pioneering work is being done in a number of other areas 
that fit under the communication sciences. Work is being done in 
artificial intelligence, communication biophysics, communication sys- 
tems, experimental psychology, linguistics, neurophysiology, proc- 
essing and transmission of information, sensory aids for the handi- 
capped, social science, and speech communications. Advances in any 
of these related fields are quite likely to be directly or indirectly ap- 
plicable to mechanical translation. Needless to say, the presence at 
MIT of a number of scientists contributing in the related areas of 
the communication sciences provides an ideal environment for the 
conduct of mechanical translation research. 

Our mechanical translation research has been sponsored primarily 
by the National Science Foundation. Our relations with this agency 
have convinced me that the Congress was very wise in setting up the 
Foundation in the way that it did. They have been able to attract an 
excellent staff. All of the people that I have dealt with at the Founda- 
tion have proved to be exceptionally competent and enlightened public 
servants. While doing their job well, they have interfered in no way 
with the scientific conduct of our research. This very enlightened 
attitude on their part has helped greatly in our being able to maintain 
at MIT an atmosphere favorable to the individual creativity so 
necessary for the nurturing of basic scientific advances. 

We have made a number of advances. The list of areas where we 
have made significant advances is long and is a matter of record. I 
will not bore you with a complete recitation; it is available if you 
want it. I would like to comment, however, on a few items. 

The work on generative grammar and the theory of grammatical 
transformations by A. N. Chomsky represents an important advance in 
linguistics. It provides an approach to syntax that sheds considerable 
light on some questions of meaning. It sets a new standard of scientific 
rigor and precision in the description of languages to replace the old 
approximative and normative type of grammar. 

Recent work by E. S. Klima, in his continuing study of English 
grammar, points out formal features common to many diverse ex- 
pressions that are negative in meaning. These formal features are 
themselves best accounted for by assuming a certain family of gram- 
matical categories that correlates not only with the general notion of 
negation but also with that of varying degrees of completeness of 
negation. The formal features extend from such words as “not,” 
“none,” and “never,” through such prefixes as “un-” and “dis-” to words 
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commonly characterized as inherently negative like “doubt” and “for- 
bid.” A correct understanding and treatment of negation will be 
essential for correct translation, and for a proper understanding of 
how our language works. 

A number of other important topics have been worked on or are 
under continuing investigation. Practical problems in computer re- 
search have not been ignored. Work onthe COMIT system in coopera- 
tion with the MIT Computation Center makes available a powerful 
programing language for use in linguistic work. By the use of the 
computer as a research tool, linguists will be able to do many things 
that have hitherto been impossible. 

I should like to conclude by discussing some of my own work. I have 
recently discovered a possible explanation for many previously puz- 
zling complications in sentence structure. It is now possible to explain 
why English has a part-of-speech system; why it has both active and 
passive; why it is that when a verb has two objects, a direct and an 
indirect object, the one that is a pronoun comes first; why we use the 
anticipatory “it” as in “It is true that he went”; why a relative clause 
follows its noun while adjectives precede; why it is that of the two 
genitive markers, the one that precedes is a oo work (of), and 
the one that follows is a suffix (’s) ; why it is that awkward sentences 
are awkward. The explanation is quite simple and is now being pub- 
lished in the Journal of the American Philosophical Society. 

Only the surface has been scratched. There are many unanswered 
questions to be investigated: What further generalizations can be 
made in syntax’ What is meaning, and how do we encode and 
decode messages in a language like English? For what purposes is 
language used, and how does the use of language affect translation ? 
What is the nature of the translation relation and how can it be 
described or specified? To what extent are languages translatable? 
By what methodology can we achieve descriptions of languages? 
What are the detailed facts making up the descriptions of the various 
languages’ The answers to these questions will be important not 
only for the development of accurate and trustworthy mechanical 
translation, but they will provide new insights into the mechanisms 
of language and thought processes, the very foundation of human 
culture. 

Thank you, Mr. Chairman. 

The Cuarrman. Thank you very much, Dr. Yngve. We appreciate 
your discussion which is really very illuminating, I am sure, to my- 
self as well as to the members of the committee. 

Before we ask you questions, however, we would like to hear from 
Dr. Oettinger, from Harvard University, and have his prepared 
statement in the record. We might be interrupted by a rollcall later 
on, and therefore we want to be sure we are going to hear from both 
of st gentleman. 

r. Oettinger, if you will proceed, we will appreciate it. 
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STATEMENT OF DR. ANTHONY G. OETTINGER,* ASSISTANT PROFES. 
SOR OF APPLIED MATHEMATICS AND OF LINGUISTICS, HARVARD 
UNIVERSITY 


Dr. Oerrincer. Mr. Chairman, members of the committee, my re- 
searches on the problems of automatic language translation, partic- 
ularly from Russian to English, began in 1950. They have continued 
to the present time, although interrupted for other activities in 1951-52 
and 1954-56. Initially, the effort was a small-scale one, with occasional 
support from such diverse sources as the U.S. Air Force and the 
Harvard Foundation for Advanced Study and Research. In the 
past 3 years, our activities have increased in scale, and are presently 
supported by the National Science Foundation and the U.S, Air 
Force. But Dr. Yngve has set the stage on which our work on trans- 
lation is carried out, and I would like to turn to some of the specific 
problems. 

Like all forms of automatic information processing, the process of 
automatic translation may be divided into three phases: input, logical 
processing, and output (fig. 1). In all present experimental sys- 
tems, and these are as yet the only kind, initial input of texts to be 
translated is by manual transcription on some keyboard-operated 
device. Experience has shown that the key punchers need not know 
the language being translated, so that scarce bilingual personnel is 
not necessary at this stage. However, considerations of speed, cost, 
and accuracy suggest that automatic print-reading equipment would 
be helpful, not only at the input to a translator, but in many other 
areas of natural language processing. I understand that equipment 
for reading Russian automatically is being developed. But it should 
be understood that the realization of fast, accurate, and economical 
print-reading equipment is a matter of considerable difficulty. 

Similar problems arise at the output of a translator in the prepara- 
tion of English texts in a form suitable for dissemination. Figure 
1 shows that high-speed printers, of a kind generally available com- 
mercially are used at present, but these have certain features that 
make it rather difficult, for example, to deal with formulas and other 
special kinds of mathematical symbols, and I have indicated that in 
the future it will be helpful to have special facilities for this purpose. 


Dr. Anthony G. Oettinger: Bachelor of arts (summa cum laude) Harvard, 1951, in 
engineering and applied physics. Doctor of philosophy, Harvard, 1954, in applied mathe- 
matics. Held positions of instructor, assistant and associate professor of linguistics and 
applied mathematics at Harvard from 1954 to date. Consultant to the Arthur D. Little, 
Ine. ; originated program in mathematical linquistics at Harvard, and has done research 
in automatic language translation. information retrieval, logical design of computers, 
automatic data processing and learning machines. Speaks and reads Russian, French, 
German, and Spanish. 
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Problems of input and output are mainly of a technological nature, 
Our work at Harvard has had little to do with input and output, 
because existing equipment is adequate for experimental purposes, 
and because the major unsolved problems bearing on the possibility, 
as distinct from the practicality, of automatic translation hinge on 
profound linguistic problems associated with the logical processing 
phase. 

At present the logical processing is done on a variety of general- 
purpose computers, and in the future we may still continue to use 
general-purpose computers, although it may turn out it may be more 
efficient or economical to build special-purpose machines. This is still 
very much an open question. 

It is helpful to consider the process of translation to be decomposable 
as illustrated in figure 2. The analysis of the foreign sentence should 
reveal in explicit form what role each word in the sentence plays and 
how it is related to the other words in the sentence. The information 
so obtained applies directly only to Russian and must be converted 
or interpreted in the translation phase so as to give equivalent infor- 
mation about English. The information about English must then be 
used to synthesize an acceptably grammatical and above all accurate 
English sentence equivalent to the original foreign one. The synthesis 
of English sentences and, to a lesser extent, translation, present con- 
siderably less difficulty than the analysis of foreign sentences. Our 
work at Harvard to date has therefore emphasized the analysis of 
foreign sentences, although some work has been done on the two other 
phases as well. 
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Three phases in the analysis of a word in a Russian sentence are 
shown in figure 3 and will be explained shortly. The lexical phase 
is one to which we at Harvard have given much experimental and 
theoretical attention in the past few years, and where relatively few 
research problems remain, outside of questions linking the lexical to 
the syntactic and semantic phases. The syntactic phase is currently 
the object of the most active research not only at Harvard but also 
at, other institutions, notably at the National Bureau of Standards to 
which our present approach to syntax owes its inspiration, and at the 
Massachusetts Institute of Technology, whose theoretical and experi- 
mental studies have often beneficially influenced our work. Consid- 
erable progress is being made in the syntactic phase, but satisfactorily 
reliable results cannot yet be guaranteed. With the exception of a 
few isolated techniques, tricks, or gimmicks, there is at present little 
that is understood, at Harvard or elsewhere, about the semantic phase. 
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The lexical phase consists of extracting information about a word 
from an automatic dictionary. The problems here are chiefly those 
of devising methods for compiling and operating automatic diction- 
aries, guaranteeing the accuracy of their contents, determining how 
large they should be, and, since the answer to the last question is 
probably “as large as possible,” obtaining automatic storage devices 
adequate to the task. Much ambiguity remains in the product of 
an automatic dictionary. 

If you will look at figure 3, you will notice a particular illustra- 
tion of one Russian word. The dictionary might tell you that this 
word “napryazhenya” could mean in English “strain,” “tension,” or 
“voltage,” we don’t know which of these three in a particular instance, 
The dictionary tells us it isa noun. It might be singular in the geni- 
tive case or else it might be plural in the nominative or accusative case, 
Just looking it up in the dictionary doesn’t tell us precisely which, 
It is of neuter gender and inanimate. So, as you can see, much am- 
biguity remains in the product of an automatic dictionary. And al- 
though some experiments have shown that this product can be directly, 
profitably, and safely used in certain applications by both monolingual 
and bilingual technical abstractors, engineers, or scientists, reading 
this product entails considerable effort on the part of the reader, who 
is essentially doing most of the work himself. This is essentially 
the realization of the schoolboy’s dream of having much of the dirty 
work of looking things up in a dictionary done for him, yielding a 
“trot,” or a “pony.” But the actual translation is done by whoever 
reads this, although it is useful to schoolboys and seems to be useful 
to others. 

Removing the residual ambiguity that remains at this stage re- 
quires considering each word not in isolation, but as part of a larger 
context, and this is the function of the syntactic phase. 

As shown in the second column of exhibit 3, syntactic analysis can, 
for example, resolve that a given noun is indeed plural, permitting 
attaching “s” to its English correspondents, and it can also determine 
that the noun is the subject of the sentence, a piece of information 
that is of great potential value in English synthesis, provided it can 
be trusted, which unfortunately is not yet always the case. Syntactic 
analysis still presents many challenging experimental and theoretical 
problems. 


RESEARCH ON MECHANICAL TRANSLATION 


ot 
el 
| p 
| 
h 
; on 
a 
b 
3 
hs 
| 
e 
3 
a 
| 
I 
] 
t 


RESEARCH ON MECHANICAL TRANSLATION 39 

The solution of these problems is of the utmost importance in areas 
other than automatic translation, a point to which I shall return. 
Except in a few special cases, syntactic analysis is powerless to settle 
such matters as, for example, which of the three English correspond- 
ents found in the dictionary is the correct one in a given test. Such 
problems are left to semantic analysis. 

You see, I can perhaps tell from syntactic analysis that this word 
‘napryazhenya” is in the plural and so I ean say it isn’t “strain,” 
“tension,” or “voltage,” but “strains,” “tensions,” or voltages,” but I 
have no way yet of knowing which of these three it is, and of course it 
might make a difference. 

The problem of figuring out which of the three it is, if any, is what 
one might lump under the heading of semantic analysis. Semantic 
analysis deals with the elusive concept of meaning, and little that is 
both worthwhile and correct can be said about it at this time. Beau- 
tiful, smooth-looking translations can be produced here and now by 
various techniques all either unsafe or question begging. For instance 
if precisely one English correspondent is stored in the dictionary for 
each Russian word, the problem of choosing among several cor- 
respondents obviously disappears: or, alternatively, if correspondents 
are labeled by technical field and the one most likely to oceur in a 
given text ina given field is chosen invariably, then again the problem 
disappears. In either case, errors as yet unpredictable in number or 
in effect will occur, but the reader is led into thinking that because the 
result, produced by the machine is smooth English, it must be right. 
The dangers of such a situation are obvious and similar to those in- 
herent in the employment of incompetent human translators. 
Another possible technique is to interpose between the machine and 
the consumer a corps of bilingual technically competent editors, re- 
sponsible for checking the work of the machine. Such a technique 
iitonts begs the question, and such a staff might well be more profit- 
ably employed translating in the old-fashioned way. 

While considerable progress has been made toward methods for 
safe, reliable, accurate automatic translation into easily readable 
English, it would be premature to consider the problem solved. Fig- 
ure 4 gives a comparative analysis of what can be accomplished, now, 


and in the future. There are three columns indicating the kind of 
thing that could be obtained from just an automatic dictionary, from 
an automatic dictionary complemented by automatic syntactic analy- 
sis, and from a system including automatic semantic analysis. 
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In conclusion, I should like to stress the more general aspects of 
current research on automatic language translation. The advent of 
automatic computing or information-processing machines is creating 
what some have called a “clerical revolution” by analogy to the 
industrial revolution, The tiformation processed by such macfiines 
js necessarily represented in some language, whether English, Rus- 
sian, or the specialized languages of mathematical, chemical, and 
other notations. These are manmade, or if you will, artificial lan- 

ages. Likewise, the instructions to machines also are expressed in 
ll specially created for the purpose. 

In this light, the achievement of a greater understanding of the 
mechanism of languages is of the utmost importance. Fortunately, 
the very machines which create this need for understanding are also 
the tools to be used toward achieving it. As a result, a discipline 
of mathematical linguistics, concerned with the interrelation of lan- 
guage, mathematics, and information-processing machines is emerging 
not only in the United States, but also abroad. What has already 
been learned about language in connection with translation studies 
is already finding and will find growing applications in such diverse 
realms as automatic abstracting, automatic information storage and 
retrieval, and the design of automatic information-processing ma- 
chines themselves. It is with some of these applications in mind that 
our group at Harvard has recently begun the study of English syn- 
tax, and of certain abstract languages of interest in the design of 
computers. However, in spite of progress to date, most of the basic 
research necessary to achieve an adequate understanding of language 
and most of the development necessary to apply the knowledge gained 
through research in a useful way still lie ahead of us. 

Thank you. 

Mr. Kine. Thank you very much, Dr. Oettinger. 

Mr. Fuuron. I defer to Mr. Bass. 

Mr. Bass. Thank you, Mr. Fulton. 

Doctor, did you happen to see the article that was in the New York 
Times in today’s issue on this subject ? 

Dr. Orrrrncer. Yes; I have had a very brief look at it. 

Mr. Bass. Was it substantially accurate, would you say / 

Dr. Orrrincer. I haven’t had a close enough look at it to be sure. 

My feeling is that it would be premature at this stage to say that any 
system for translation is really adequate. This doesn’t mean that 
for limited purposes something cannot be accomplished today, as I 
indicated in my statement. 

Mr. Bass. I would like to explore with you one or two implications 
in this field. 
| So far we have been concerned primarily with translating Russian 
| into English. 

Is it theoretically possible to handle this in any other language— 
any other language translation? For instance, Chinese into English, 
or English into Chinese? 

Dr. Orrrincer. Well, until fairly recently, there have been so many 
difficulties in the way of working with only a pair of languages that 
by and large most of us have worked with a single pair. For example, 
our group has been working from Russian to English, and Dr. Yngve’s 
from German into English. 
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I think some fairly recent theoretical developments, some of which 
happened at MIT, others at the Bureau of Standards and at Harvard 
suggest that we may be emerging with some more general techniques 
that would be applicable to other languages, but it is too early to say 
just how far they will go. All one can say is, that the closer within 
one family a group of languages are, the more likely it is that the same 
techniques would work. 

Chinese, unfortunately, presents some peculiar difficulties of its own, 
and since I don’t know it, I would hesitate to hazard a guess as to 
whether anything we have done for Russian would apply to Chinese, 
except in very general terms. j 

Mr. Bass. That leads to my next question, namely, are the problems 
involved in translating, we'll say, German into English by one of these 
machines easier than Russian into English? Or is it all essentially the 
same problem ¢ 

Dr. Oerrrrncer. It is still very difficult to compare. You see, one 
of the major problems is that we are at this time still very largely 
ignorant of most of the details, important details, about the structure, 
say, of German or of Russian, and it’s still too early to make sound 
generalizations. We have some ideas about how to generalize, but a 
generalization without the possibility of checking whether or not it 
applies in particular instances is worse than nothing. It can be quite 
dangerous. But I think sound generalizations will emerge. 

Mr. Bass. Well, at present, then, as I understand you, Doctor, you 
are working just in two fields—two languages, German and English, 
and Russian and English? 

Dr. Orrrincer. We are working at Harvard on Russian into Eng- 
lish. We are also doing some work on English itself. Professor 
Yngve’s group at MIT is working on German to English, and a 
number of other groups are working on Russian to English and some 
are, I believe, also working on French to English. 

Mr. Bass. I am through. 

The CuHairman. Dr. Oettinger, do you have any examples, any 
illustration of what you have been able to do to date? Of course, we 
couldn’t translate the Russian, but we could perhaps read the English. 

Dr. Oerrincer. I don’t have any with me. I would be glad to sup- 
ply such. 

The Cuarrman. I would like to see, for instance, a smooth trans- 
lation where you might detect some perhaps latent error in the shade of 
the meaning in the translation, or where you could obviously see a 
mistake in the meaning of a word in the translation. 

Dr. Orrrmncer. Yes. These are easier to produce than any others. 
Almost any smooth translation that I could produce today would be 
bound to have errors, many of them obvious, and some of them subtle. 

The CuHatrman. I think it would be of interest to the committee to 
see graphically what you have in mind in reference to errors. 

Dr. Orerrincer. All right. 

Mr. Bass. Would you yield for one question / 

The Cuairman. Yes, I yield, Mr. Bass. 

Mr. Bass. I just have one question. 

Doctor, as a matter of curiosity, when you feed into one of these 
machines a Russian sentence, how long does it take, in a matter of 
seconds, for the English counterpart. to come out at the other end? 
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Dr. Oerrincer. This is a question I can’t answer too well from my 
own experience, because our work has been largely experimental, and 
because we have paid very little attention to the question of how to 
make the process as fast and efficient as possible. It probably takes 
our machine longer right now than it would take a competent person 
to do the job. 

I don’t think that there are any serious technological difficulties in 
pringing the speed of the process to a point where it could be eco- 
nomically reasonable—that is, the speed of computers is something 
that is going up far more rapidly than we are able to understand 
what it is we want to use this speed for. 

Mr. Bass. Thank you. 

The Cuairman. So that economically there is no problem there. 
The problem is that of accuracy ¢ 

Dr. Orrrincer. Well, I wouldn't say there is no problem. I would 
say that at present the problem of economic feasibility seems so far 
less serious than the problem of accuracy, because I think that if we 
were really sure we knew how to translate accurately, we would prob- 
ably find a way of doing it economically, or at least then face the 
question. 

The CuatrmMan. I read in a statement a good computer could do the 
work of 175 efficient translators. What would you say about that? 
Dr. Orrrincer. Well, “could when 2” is the question. 

Right now, this is simply impossible, because I don’t think there is a 
computer that can do the work of one efficient translator. 

The CuarrmMan. So would you say it could do the work of 175 poor 
translators? | Laughter. | 

Dr. Orrrincer. 1 think the work of 175 poor translators had best 
remain undone, because it means that 175 expensive technical people 
who read the result then are worse off than if they didn’t have the 
translation in the first place. They have first to realize that a mistake 
has been made, that they have been had, and then go on and do some- 
thing about it. If they don’t understand the original Russian, they 
say “I give up” and don’t waste any time. 

The CuairMan. I see I can’t lead you out too far. 

Would you say we are making such progress that the prospects 
for immediate production of translation are pretty good? 

Dr. Orrrincer. No, I think that the prospects for immediate pro- 
duction are pretty poor, if by translation you mean something that can 
besafely put into the hands of an unsuspecting reader. 

I think that if we are talking about various mechanical aids to trans- 
lators, or to readers, the prospects are somewhat better. One can con- 
sider machines at the various stages that I have described in my state- 
ment to be possibly of some usefulness in assisting the work of trans- 
lators or of abstracters, provided these people are not under the mis- 
apprehension that they have been given a reliable translation, and that 
they still have access to the original to check what undeniably will be 
mistakes in whatever the machine produces. 

The Ciaran. That is what I wanted to ask you, then. 

If one would have the product of the translation before him, could 
that person read the translation, suppose it is an unsuspecting person, 
and determine when the translation is good or when it is bad, when it is 
accurate and when it is spurious ? 
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Dr. Oerrrncer. This is sometimes extremely difficult. I think we 
have very little experience with the output of machines in this respect, 
but the experience with the product of human translators indicates 
how difficult this can be. 

I am sure many people who have had occasion to use translations 
have found that when they got through something, they had a feeling 
it didn’t quite make sense, but they just don’t understand where or 
why, and then have to go back to the translator, if they know where 
to find him, and ask him to do the job again. 

The Cuarrman. I can see some tremendous advantages of a machine 
translation, because now I have often thought, as I sit and listen toa 
great speech in the House delivered by some foreign dignitary, that we 
are at the mercy of a human translator. If he is sympathetic with the 
speaker, why then the end of the speech may find me sympathetic, 
whereas if he is unsympathetic he can color the translation in such a 
way as to produce the reverse results. 

A machine wouldn’t, at least we haven’t reached that art yet, where 
a machine might change and be sympathetic to one translation and 
unsympathetic to another, have we? 

Dr. Oerrincer. I think this is one difficulty we might avoid with 
machines, 

The Cuarrman. Mr. Fulton ? 

Mr. Fuutron. At page 2 of Dr. Yngve’s statement, he says, “Support 
has been adequate. I know of no one with a sensible research proposal 
being for long without adequate support in this field.” Is that the case! 
Would you both say you have had adequate funds for support, that is, 
both institutionally, Government, and also from private foundations 
and funds? 

Dr. Yneve. I can say that at MIT we have had adequate support, 
yes, sir. 

Dr. Orrrincer. I think that financially the support has been ade- 
quate. There is one feature of all forms of Government support that 
is extremely disconcerting in a university setting, and that is its annual 
nature. I find myself spending what I feel is excessive amounts of 
time preparing proposals and worrying about them. I find that it is 
extremely difficult to build up a good staff on the premise that, well, 
maybe here today, gone tomorrow. I think our work, and I’m sure 
that at most other universities, would benefit greatly from some form 
of more stable continuing support where the “chunks” were in 3-year 
periods or 5-year periods, rather than in annual terms. 

Mr. Furron. Of course, that is one defect a legitimate person has 
to combat, and also a weight he must carry, but you must remember 
that within the last 2 years in the West there appeared a man who had 
been working for 2 years on a very advanced research project and 
he had supposedly a doctor’s degree in the subject. He was working 
on it for 2 years, then they found he was a complete fake with no 
education at all. 

How do you distinguish between the ones who are supposedly on 
research and advanced thinking, and the ones who when the screen- 
ing is done, are worthwhile, because even the ones who are worthwhile 
may not come up with a valid result. 

Dr. Oerrrincer. That is right. 
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Mr. Fuuron. So is it based on your good intention, or is Congress 
to look it over once a year and see how we think you are getting along? 
Dr. Orrrincer. I think this could be done in either case. After all, 
the example you cite presumably happened under the present system. 

Mr. Futron. Very embarrassing. 

Dr. Orrrincer. It need not be worse under longer grants, neces- 
sarily. 

The Cuamman. I think you would have to have a very good com- 

uter to work out a program where Congress will obligate itself far 
inadvance. [ Laughter. | 

Mr. Fuvron. Would you like to be working for an institution that 
was just permitted to idle along without supervision for a year, and 
longer, and then have it suddenly brought up to a halt while there 
was an extensive investigation going on, or would you rather have it 
proceed from time to time in shorter intervals? 

Dr. Orrrincer. Well, I think speaking for myself, I would prefer 
the first alternative. The universities in a sense are organized in this 
way: At a certain stage they make a commitment to a man to keep him 
on without limit of time. This is a gamble, but I think the history 
of universities has shown that by and large this is a very good means 
for producing the kind of atmosphere in which imaginative research 
can take place. There is no question that under the tenure system 
occasionally a professor will sit back and do nothing for the rest of 
his life, but I think in the overwhelming majority of the cases this is 
not true. While I certainly couldn’t guarantee that under a longer 
term appropriation there wouldn't be some waste, my guess would be 
that by and large the effort would be a better one. 

Mr. Funron. May I say I feel your statement is a little inadequate 
with one respect, with a smile, because on page 4 of Dr. Yngve’s state- 
ment he says: 

MIT provides an ideal environment for the conduct of mechanical translation 
research— 
but you have no similar comment on Harvard. [ Laughter. | 

Dr. Orrrincer. Well; I think this is an omission that should be 
remedied. I think it does. 

Mr. Futron. From your statement possibly are you trying to do too 
much? For example, I have been interested in the Navy, and in the 
Navy we can translate messages for a whole group of ships into flags. 
People see the flags and then they all operate. 

That is a system that must be infallible, because under emergency 
conditions they are operating within a thousand yards of each other 
at collision speeds that if they both turned the same way toward each 
other there is very little leeway in distance and speed. 

That means they must understand the signals. 

If you were starting in English, or a natural language, and tried 
translation on simple ideas within the language—take a sentence like, 
“All men are animals,” and then get the converse and the obverse of it, 
and translate it within the language so that you are able to get alter- 
native statements that limit meaning. Have you done any work on 
that ? 

Dr. Orrrrncer. I’m afraid that this problem is of the semantic 
kind, that we are the least equipped to handle, whether within English 
56002—60-—4 
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or within any other language. I am sure that this kind of problem 
would be as hard to solve within English as in going from one language 
to another, because the essence of the problem would lie in finding out 
just what the meaning of the initial sentence is in the first place. Once 
I know this, taking it into some other form, in English, or in some 
other language, is relatively easy. 

Mr. Furiron. I wonder if you aren't omitting a step, or trying to 
limit the meaning of a word. 
Dr. Orrrincer. This step I'm afraid is one of the last ones. 
The other examples that you gave are indeed easier in this sense, 
that the flag system in the Navy is one very good example of an 
artificial manmade language. You see, I distinguish between a nat- 
ural language and an artificial one. You and I have no control as 
individuals over how English grows and changes. 
Mr. Fuvron. That is what I am trying to get away from. I am 
trying to say to you, why don’t you start at a point in time and 
then go on from there rather than try to go back and get all the 
variations of meanings that are almost limitless in the past / Why 
don’t you adopt maybe a thing which you would call basic Russian, 
and equate that, or correlate it with basic English? You see maybe 
you had better start on a limiting course which takes you out of 
these tremendous, incalculable, mathematically incalculable varia- 
tions. You see, if you are going to go back into past history and find 
how Napoleon used “empire,” and how Queen Elizabeth first used it, 
and this present Queen Elizabeth uses it, and try to put that into a 
machine, one word gets into an astronomical group of things that 
would lead off into a sequence. 
Maybe the approach should be the other way around. 
Maybe if you started with Mary and lamb, and build it so that you 
have a group of meanings in a particular field you might have to 
limit it between two basic. languages. I think possibly that would be 
a better approach, wouldn’t it, than this other way ? 
Dr. Orrrincer. Well, I think you are absolutely right on this. 
And in fact we have done this. However, even “Mary” and “lamb,” 
from that point of view, are still too complicated. What we have in 
fact done is gone back to still simpler language, some of the notations 
of logic, where meanings are much more rigorously defined, where 
the syntax of the language is much more rigorously defined, and we 
have studied some of the problems of translation there. Some of the 
results that we have gotten have indeed helped shed some light on 
what we were trying to do with Russian. So that I think what you 
say is quite true. It i is necessary in some instances, to go to absolutely 
the simplest thing you can find, and this we find in some of the 
mathematical and logical notations where we have studied some of 
those problems. 
Mr. Furron. May I thank you very sincerely for the indirect 
compliment to Congressmen, and say what a high level the public 
expects of Congressmen now. We are supposed to know the differ- 
ences, for ex: umple, i in these theoretical subjunctives, pe: as the puni- 
tive conditional subjunctive, and likewise, on page 2, of your state- 
ment here. We are supposed to know how to link the lexical to the 
syntactic and the semantic phases of this subject—thank you, I 
deeply appreciate the honor. [Laughter. | 
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The CuatrrmMan. Now, I just had word that there was a vote on the 
floor on an amendment, the Cooley amendment to the Agriculture 
appropriation bill, so they are rTP the final vote on the bill 
which will be a record vote probably. My thought is this, Mr. King 
has not yet asked any questions. We ought to head for adjournment 
sometime around 4 o’clock if we are going to meet that vote. 

Mr. King 

Mr. Kina. I will be brief. 

The CrairMan. That is 10 minutes. 

Mr. Funron. Is there a meaning to that 

The Cuairman. That is not a directive to Mr. King, but I think he 
ought to bear that in mind that there is that possibility coming up. 

Mr. Fuutron. Mr. King, | hope you accept that semantic inference 
latent in that sentence. 

The CuairmMan. The gentleman from Utah is recognized. 

Mr. Kine. Dr. Octtinger, I can’t help but approach this whole 
scheme with a little healthy skepticism, although I am sure my skepti- 
cism will not dampen your ardor in any way, and I don’t want it to. 

But when I think through the problems that are involved in this, 
they just seem so staggering to me. You have, for example, words 
in English, like the word “run” that has something like 73 specific 
designated dictionary meanings, and the only way in which you can 
tell which meaning is intended is, well, it is Just that something that 
lies in the human mind that is able to segregate out of all of these vast 
possible meanings which one is actually intended. 

I don’t see how in the world you can breathe into this machine the 
ability to ferret out one of 73 meanings and translate that into Russian, 
and then you get into the matter of idiomatic constructions, which 
defy grammatical rules. They are rooted in history. They just grew 
up that way. Historical accidents, but there they are. And when 
you say them, you havea special meaning. 

For example, you used the term “they Il know that they’ve been had.” 
What you mean, of course is that they have been imposed on. But I 
know that and you know that, but does this machine know that? Does 
it know that that is what you mean? Well, maybe you will have a 
little slot in your machine for “they have been had,” equals, “they 
have been imposed on,” and then your Russian machine will give the 
equivalent of “you have been imposed on,” but you are running into 
some very, very slippery situations there. Maybe one idiom means 
three different things. For example, he went to town. Well, “town” 
isa small city, so he went to the small city. But then you also mean, 
he went. to the business district of a large city. And then you also 
mean, if you say it with just a little flicker of the eyelash, that he 
did a terrific job, he outdid himself. 

You know that and I know that, but does the machine know that, 
and how is the machine going to ferret out all of these shades of mean- 
ings and these idiomatic constructions and put over the equivalent in 
Russian or French or whatever it is? 

So what I’m saying is, it seems to me that the machine will do a 
super job on simple sentences, well, “Mary had a little lamb,” that 
sort of thing, where there is no particular problem of working out 
equivalents for your dealing with objects and concrete nouns and 
simple verbs like “walk,” “talk,” “lie down,” and so on, but when you 
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get into the realm of abstract thinking, or the realm of idiomatic 
constructions in English, French, and Russian, and they have hun- 
dreds of thousands of them I guess, I just don’t see how you could 
ever make the thing work. 

Dr. Oerrincer. If I could answer your question fully I think TI 
would have a working translator, but I can’t. You are quite right, 
the problems are staggering. What a machine does is precisely the 
product of what its designers design it to do: our problem is finding 
out how to instruct the machine. 

Now, if we try to look at the whole problem, it is staggering, but 
there are pieces that can be solved, and then some of the later stages 
don’t seem so bad. 

For example, take the word “run.” If I have the word “run” in 
isolation, and go to a dictionary the word “run” will be looked up, 
and we get the 73 meanings and so forth. 

In English, I could have “run” as a noun, say, “a run on the bank,” 
or I could have it as a verb, “we run for a train.” 

Now, it is possible to distinguish at least in many instances, bet ween 
the verbal use and the noun use in a fairly simple way. The presence 
of the article “a,” in “a run,” almost invariably indicates that the 
following hie will be a noun. Therefore, taking the context into 
account, we are able to eliminate all the possibilities associated with 
the verbal use of the word. 

Likewise, if I say “We run for a train,” the odds that a noun would 
follow “We” are very small. So that we can presume that “run” 
is a verb in this context. This sort of thing can be done by the kind 
of syntactic analysis techniques that I mentioned in my statement. 
However, we can’t do it and give you a guarantee that it will be right. 
This is still one of our problems. 

Now, when we get into some of the other questions of what kind 
of a “run,” “a run in the stocking,” “a run on the bank,” and so forth, 
at this point things get much harder and really very little is known. 
In theory, as you, yourself, pointed out, one could put all possible 
combinations of “run” with other words into a stupendous storage 
device in a machine and have all various contexts accessible there for 
lookup. This is simply impractical. 

There is no question but that the problems are still very difficult. 
However, many of the things that today seem easy to us, seemed 
awfully difficult as little as 6 months ago, and so one just has to go 
ahead. 

Dr. Yngve has a sign over his desk saying: “This problem, when 
solved, will be simple.” And that is where we stand. 

Mr. Krxe. That is very good. I wish you well. 

The CHatrmMan. Well, what you are saying, in effect, is that we are 

making substantial progress, and that what was impossible yesterday 
is accomplished today. 

Dr. Orrrincer. Yes; but there is still enough that is impossible 
today so that we haven’ t finished our work. 

The Cxarrman. You wouldn’t have a challenge if you didn’t have 
that situation. 

Dr. Orrrincer. Yes. 
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The CHarrman. Is there anything that we can do—Congress can 
do, without unlimited time or appropriations—is there anything we 
can do to assist in that program ? 

I know of nothing that would be more acceptable to our people than 
to get one of these Russian newspapers with fresh news about what 
Mr. Khrushchev is saying over there, run it through a machine, and 
get an impartial translation right away of what he says. 

Dr. Orrrincer. I am afraid no act of Congress can give us the time 
and the inspiration that we need. This just comes when it wants to. 

The CnarrmMan. You need time, and you need effort and inspiration. 

Dr. Orerrincer. Yes, and I share Dr. Yngve’s feeling that since our 
support comes from National Science Foundation, we are all right 
financially, and also morally all right in the sense that we are left 
to do as our scientific conscience dictates. I think we have made some 
progress, and we hope to make more, and nothing much will help us 
other than going back to our offices and thinking. 

The Cuatrman. Your relationship now with the National Science 
Foundation on this project is satisfactory ; is that correct ? 

Dr. Orrrincer. Extremely so. 

The CuarrmMan. Dr. Yngve? 

Dr. Yneve. Yes; as I said in my statement, extremely so. 

The CuarrmMan. It is all right with both of you, then. 

Mr. Fuuron. This looks like a long field ahead for ex-Congressmen. 

The CHarrman. I don’t think ex-Congressmen have any business 
getting into this field. 

Are there any further questions ? 

Mr. Fuuron. I want tothank them. They have made very excellent 
presentations. 

The CuairmMan. Yes; it was most edifying, and a most learned 
discussion, and we can only, as a committee, hope for the very 
best results at an early date with a most difficult problem that you 
have. 

Dr. Yneve. Thank you, Mr. Chairman. 

Dr. Oerrincer. Thank you, Mr. Chairman. 

Mr. Fuiron. The four doctors of science on the subcommittee here 
today want to congratulate you. 

The Cuatrman. The subcommittee will adjourn until tomorrow 
morning at 10 o’clock. 

Thank you again. 

(Whereupon, at 4 p.m., the subcommittee adjourned, to reconvene 
at 10a.m., Thursday, May 12, 1960.) 
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THURSDAY, MAY 12, 1960 


Hovst or REPRESENTATIVES, 
ON Scrency anp AsTRONADTICS, 


INVESTIGATING SUBCOMMITTEE, 


Washington, DL. 

The subcommittee met at (0 a.m., Hon, Overton Brooks (chairman) 
presiding. 

The CuatkMan, [ call the meeting to order, 

This morning we have the pleasure of having Brig. Gen. William J. 
Ely, Director of Army Research, accompanied by Lt. Col. Dimitri A. 
Kellogg, Army Research Office, and Lester Geiger, of the Signal Corps 
Research and Development Division. : 

I am told by counsel that Dr. Edward Cannon from the National 
Bureau of Standards is here. We want to go ahead and give the 
general an hour and reserve an hour for the Bureau of Standards. 
It is with that in mind I am going to ask you to proceed now, General, 
and we will get your statement and then have an opportunity to ques- 
tion you before we get to the Bureau of Standards. 

General Ely is a Pennsylvanian by birth. He has a long and dis- 
tinguished record in the Army. If you will proceed, General Ely, 
we will appreciate it. 


STATEMENT OF BRIG. GEN. WILLIAM J. ELY,’ DIRECTOR OF ARMY 
RESEARCH; ACCOMPANIED BY LT. COL. DIMITRI A. KELLOGG, 
ARMY RESEARCH OFFICE; GREGG McCLURG, ARMY RESEARCH 


OFFICE; AND LESTER GEIGER, SIGNAL CORPS RESEARCH AND 
DEVELOPMENT DIVISION 


General ELy. Thank you, sir. 

Mr. Chairman, I am Brig. Gen. William J. Ely, Director of the 
Army Research Office. I would like to introduce the following sup- 
porting witnesses: 


Lt. Col. Dimitri A. Kellogg, of the Army Research Office; Mr. 
Gregg tele, of the Army Research Office; and Mr. Lester Geiger. 
of the Signal Corps Research and Development Division. 


® Brig. Gen. William J. Ely: Born on December 29, 1911, in Sycamore, Pa., William J. 
Ely graduated from high school in Claysville, Pa., in 1928. He attended Carnegie Insti- 
tute of Technology for 1 year before entering the U.S. Military Academy. He graduated 
with the class of 1933, receiving a commission in the Regular Army as a 2d lieutenant in 
the Corps of Engineers. 

Following graduation, he had permanent duty stations at Memphis, Tenn., Ithaca, N.Y. 
(where he attended Cornell bgp ig E Fort Belvoir, Va., San Francisco, Calif., Midway 
Island, Honolulu, Hawaii, Fort Ord, Calif., and Washington, D.C., until February 1943. 
‘During this period, his duty assignments included civil works construction, military con- 
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My prepared statement today will cover: (1) The effort and funds 
that the Army is expending on machine translation; (2) our research 
approach; and (3) our concept of the use of machine translation. 

First, some quick statistics on our program: $100,000 a year at the 
University of Texas in German-English and English-German trans- 
lation under Professor Lehman; $70,000 this year and $125,000 next 
year at the National Bureau of Standards in Russian-to-English 
translation under Mrs. Ida Rhodes. 

There are other research projects for information processing which 
have indirect bearing on machine translation for the future such as a 
field-type input device and a rudimentary print reader. This re- 
search 1s funded at $200,000 this year. It is hard to say how much 
of this should be charged to machine translation, but I would esti- 
mate $50,000. 

Confining myself to the two Army projects in pure machine trans- 
lation, there are about 20 people involved. Professor Lehman has 
15 persons, some of whom work part time. Mrs. Rhodes has two 
full-time persons and several volunteer workers part time. 

I would like to describe our two main projects briefly. 

Dr. Lehman’s project is a new one, be than a year old. It isa 
long-term research effort based on computer analysis of parallel 
German and English texts. One might call this the objective ap- 
proach: examine the parallel text in two languages and derive rules 
for translation by means of the computer. This is a tedious, sytem- 
atic approach, but it has the great advantage of being equally ap- 

plicable in either direction for a pair of languages. Translation of 
Rhptiah into other languages by machine methods is extremely difli- 
cult because the English language has few “handles.” By “handles” 
I mean identifying features built into a word which tell one im- 
mediately whether it is a noun or verb, its case, its gender, its num- 
ber, and so on. We ourselves make these identifications by context. 
One might almost say by instinct, but really by using the vast num- 
ber of tidbits of information in our minds. A machine has no such 
information unless every single bit has been put there, and we are 
very far from being able to put as many pieces of information into a 
machine as even a year-old child possesses. 


eauetion, troop duty with an engineer unit, and school at Fort Belvoir as well as at 
ornell. 

He was assigned to Headquarters, Sixth Army, in egeinng? | 1943, serving with the 
engineer section of that headquarters throughout the war in the Pacific, seeing duty in 
Australia, New Guinea, Philippine Islands, and Japan. He returned to the United States 
in December 1945. 

After a few months in the Office of the Chief of Engineers in 1946, he was assigned 
to the faculty of the Armed Forces Staff College in Norfolk, Va., as an instructor in 
the Logistics Division in September 1948. He was reassigned to the Joint Logistics Plans 
Group, Joint Chiefs of Staff, in July 1949, serving there until October 1951. 

He then started a 2-year tour as Chief, Military Construction, Office of the Chief of 
Engineers, in Washington, D.C. From July 1953 to January 1956, he was district engi- 
neer of the Corps of Engineers in Sacramento, Calif., supervising large civil and military 
construction and real estate programs. 

He reported to Headquarters, U.S. European Command in January 1956, assuming his 
duties as Deputy Director, J--4. 

He reported to the Office, Chief of Research and Development in March 1959, assuming 
his duties as Director of Army Research. 

He is married and has three sons. His advanced schooling includes a master of science 
degree in civil engineering from Cornell University. 

(Decorations: He has been awarded the Legion of Merit with one Oak Leaf Cluster; the 
Silver Star; and the Bronze Star. 

Promotions: He was promoted to Ist lieutenant, June 14, 1936; to captain, October 1, 
1940; to major, February 20, 1942; to lieutenant colonel, August 21, 1942; to colonel, 
March 25, 1944. He reverted to lieutenant colonel, July 1, 1947 ; was promoted to colonel, 
December 30, 1950; and to brigadier general, March 16, 1956. 
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This is why a mathematical parallel correlation technique may 
prove to be the best for English to other languages and why we are 
supporting Dr. Lehman’s work. 1 would like to repeat that this is 
a long-term project from which we cannot expect quick results. 

Mrs. Rhodes’ project is only slightly older than Dr. Lehman’s— 
about a year and a half. By oversimplifying, I might be permitted 
to call her approach the subjective one versus Dr. Lehman’s objective 
approach. Mrs. Rhodes is making a program for machine transla- 
tion of Russian to English by a method called predictive analysis. 
This might be described as taking each word in the sentence and 
shaking it by its handles until is gives the machine all the possible 
information it contains both about itself and about other words in 
the same phrase, clause, and sentence. 

Russian words have many handles—prefixes, affixes, grammatical 
endings, and especially word agreements. A machine can be pro- 
gramed to identify these handles and make predictions of what else 
must be in the same phrase, clause, or sentence. When the machine 
finds these predicted items, it goes on to the next problem, satisfied. 
If it doesn’t find them, it stores the predictions and goes on with an 
eye cocked. If there are multiple choices, the machine makes a choice 
but stores the other choices to try if the first proves wrong. Some 
predictions must. be fulfilled, such as having a subject and a verb, 
expressed or implied. Others may be fulfilled or may not. When the 
machine reaches the end of a sentence, it examines its hindsight pool 
where unfulfilled predictions are stored. If it finds any that are 
labeled “must be fulfilled,” then it knows that the translation is 
probably faulty. This may happen for a number of reasons such as 
printing errors, omissions, or grammatical errors by the author. This 
is a quick and crude explanation of Mrs. Rhodes’ technique. The 
making of the program depends on inherent. knowledge of the Rus- 
sian language to a degree which makes everyday grammars and dic- 
tionaries look like kindergarten readers, and also requires skill in com- 
yuter coding. Since Mrs. Rhodes is a mathematician, a Russian-born 
lindeaiat, and a computer coder, it. is fairly obvious why she can create 
such a program and why I call her method subjective. She pulls it 
out of her own mind, in computer code. 

I regret to report that at this moment Mrs. Rhodes is being treated 
for a heart condition which hospitalized her 10 days ago, probably 
from overwork. It appears, however, that she should be up and 
about in less than a month, which makes me happy both for her own 
sake and for ours because she is indeed the indispensable heart of her 
project. 

We in the Army think that Mrs. Rhodes has the most promising 
machine translation program there is. It is not as far along as some 
other programs, since it is relatively young, but we feel that hers has 
fewer inherent limitations, and that in the matter of less than a year 
she can demonstrate a product second to none. 

As regards the future of machine translation and its uses, the ob- 
jectives of Army machine translation research are as follows: 

1. Machine translation of selected scientific literature and intel- 
ligence documents with a timelag as close to zero as acquisition of the 
source material and any required postediting will permit. 
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2. Automatic abstraction of practically the entire scientific output 
of the Soviet Union, except where abstracts in English already exist, 
or else where Russian abstracts simply need translation. The avail- 
able or automatically prepared abstracts are to be translated. On 
the basis of these, selected complete texts will be translated. 

These objectives call for some explanation. First, the term “ma- 
chine translation” can be misleading. What we really mean by ma- 
chine translation from a practical standpoint is 90 to 95 percent. ac- 
curate transfer of the intended idea, counted by whole sentences, with 
no preediting or postediting. 

If the product serves its purpose in this manner, it is truly machine 
translation. If it requires postediting of any sort, it should be called 
machine-aided translation. 

We hope to attain practical Russian-to-English machine-aided 
translation within the next 2 or 3 years. “Practical” is the key word 
in this statement. It means more economical in operating cost, time, 
and utilization of skilled manpower than hand translation is today. 
When this point is reached, volume production is in order. 

We believe that postediting will be needed for some time, primarily 
because of the semantic problems which will cause ambiguous mean- 
ings even though the grammatical problems are solved. These se- 
mantic problems consist of a print out of two or more possible English 
meanings, from among which a nonlinguist can make a proper choice 
if he is familiar with the subject being discussed. 

Finally, I would like to mention the input problem, which is a 
bottleneck in machine translation research and eventual production, 
since input to machine translation is presently prepared by hand 
keypunch at 600 words per hour per machine. A skilled translator 
can do hand translation at 600 words per hour without the benefit 
of any machine. Obviously we need an input system that can keep up 
with a machine translation program that can turn out 10,000 to 30,000 
words per hour. The solution is a mechanical print reader which can 
take a printed journal page and put it on tape by automatic optical 
scanning in a couple of seconds. The Army is poneentay the de- 
velopment of such a device on a priority basis for use by all projects 
desiring it for machine translation research and production. A num- 
ber of devices exist or are being made which contain some of the 
desired features, but none which can do all that is needed; namely, 
take everything on a printed Russian journal page and put it on tape, 
including words, equations, and drawings. 

A number of recent developments make such a device feasible now, 
so we propose to make one, for the benefit of our research, everyone 
else’s research in machine translation, and for eventual production. 

Production of machine aided translation of Russian to English is 
our short term objective for scientific information and intelligence 
purposes. We also wish to pursue the long-term goal of machine 
translation without human postediting. In the pursuit of both these 
objectives, we offer and seek maximum possible cooperation with 
every other agency and service since the benefits will accrue to all. 
The Army appreciates this opportunity to present its machine trans- 
lation program to your committee. 

The CuarrmMan. Thank you very much, General Ely. We appre- 
ciate your fine statement. 


2 
a 
q 


RESEARCH ON MECHANICAL TRANSLATION 55 


It appears from looking at your statement that you feel that 
machine-aided translations will be practical within a year or two. 

General Ery. Two or three years, as stated in the statement, sir. 

The Cuarrman. Two or three years. Then we can proceed with 
the knowledge that when the translation occurs it is going to have 
to be checked over by a human factor ? 

General Exy. This is our feeling at this time. In other words, we 
are seeking the longer-range goal where you can really have machine 
translation, but within the foreseeable future we feel that you will 
have to have machine-aided translation. This does not mean neces- 
sarily that you will have to use skilled translators for postediting. 

The CuairMan. Now, I notice the Army has been in this program 
about a year and a half. 

General Exy. That is correct, sir. 

The Cuatrman. Other agencies are in the program, too, as you said, 
the CIA and the Air Force, the Navy, and the National Science 
Foundation. How did it happen the Army got in late, and why 
should the Army go in it when the others were already on the pro- 

ram ? 
' General Ery. The Army, as you know, sir, has many uses for, and 
has been interested in, automatic data processing in a variety of ways 
for a number of years. In fact, I am sure you know that the Army 
was responsible for the development of the first high-speed com- 
uter. 
; In dealing with the scientific problems and the intelligence prob- 
lems, it is a natural consequence of our computer interest that we 
would also be interested in this other use of computer techniques. 

The Cuatrman. In other words, this is just a side use for the com- 
puter which you have helped to develop ? 

General Evy. Well, for computers in general. The one we have 
helped to develop has long since been passé; there are far better 
computers. Mrs. Ida Rhodes’ project, which is the one we are most 
hopeful about, really originated in connection with our activities in 
the computer field. We had been interested in machine translation. 
We heard there was much hope for her method. Therefore, the Army 
decided to put some support behind it. 

The Cuatrman. And that at the present time is the most promising 
program the Army has, the Rhodes program ? 

General Yes, sir. 

The Cuatrman. The Army has representation on the CIA com- 
mittee, and on the National Science etabdiion Committee. Can’t 
you depend on those two committees for your program ? 

Tell the committee why you need the additional program. I under- 
stand historically you have helped develop the computer, but is that 
a justification for going into this program ? 

General Ey. No, sir, I wouldn’t call that the justification, but this 
is like all other research. We are on these committees. We are co- 
ordinating our activities with theirs. They know what we are doing. 
We know what they are doing. We feel that this is an area in which 
a lot of hands can be used profitably to produce the end result that 
we most desire. This is like missile research or other research; more 
than one agency should be in the activity and working along the hope- 
ful direction. 
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The CHarrman. Of course, there is a limit beyond which you 
wouldn’t want to go there. You wouldn’t want too many in there. 

General Exy. That is correct. 

The Cuarrman. And there is bound to be duplication, even in your 
present program. At the bottom of page 4 of your statement 
you make reference to the Army proposal for the development of an 
automatic optical scanning print reader. Tomorrow we are going to 
hear testimony of Baird-Atomic, Inc. This company has a print 
reader under development. Are you thoroughly familiar with that 
development ? 

General Exy. I wouldn’t say that I am thoroughly familiar. I 
believe Colonel Kellogg on my left here has a good familiarity with 
it, and I would like to have him speak as to that, if you would care to 
hear him. 

The Cuatrman. All right, Colonel Kellogg. 

Colonel Ketxoce. In our proposal we first examined the field of 
print readers, essentially in the entire industry. I believe we in the 
Army have either read of or seen or tried to study every print-letter 
reader anyone has proposed. After so doing, we came to the con- 
clusion that none of them had the desired features that are required 
for the program, nor could be modified to do so without starting over. 

After doing this, we submitted our proposal. 

The Cuatrman. Then you are familiar with the Baird-Atomic, Inc., 
development 

Colonel Ketioee. Yes, sir; I have not actually seen the device, but 
I believe I am as familiar with it as I can be without having done so. 

The CuHairman. Are you satisfied with the Baird-Atomic machine 
so that when it is perfected it will accomplish what you want, what 
you have in mind for your own program / 

Colonel Kriioce. No, sir, I do not believe the machine, as presently 
given in the specifications, will satisfy our requirements. I can go 
through those, if you wish, in detail. 

The Cuarrman. All right, sir, I think it would be wise for this 
reason: If Baird-Atomic is going to develop this, then it brings 
up the question: Why should the Army be developing the same 
thing? 

Tell us something about it to justify your working on the same pro- 
gram. 

Colonel Ketioae. Yes, sir. May I say in preface, Mr. Chairman, 
that the purposes sometimes differ. The Air Force method, though I 
don’t want to tell about their program for them, was devised for a 
quick automatic look-up, which is an excellent device for an excellent 
purpose. To my knowledge, the Baird-Atomic device is an extension 
of this into an integrated system, which for its purposes of look-up, 
word by word, I believe to be an excellent device. 

This is not our objective. We happen.to feel that this is not ade- 
quate for the objective of translating scientific articles. There are 
other concerns also interested in print reading for a different reason; 
namely, in English, there is a tremendous need for information stor- 
age and retrieval. 

Any company that can make such a device successfully, I believe, 
will have great sales. The point is that no device, to my knowledge, 
that exists or is under construction or in the planning stages today 
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incorporates the features that we feel are required. Specifically, we 
want no preediting. The taking of a document, pasting sheets of 
paper on it, making cutouts, inserting alinement marks, and so on, I 
believe is defeating the purpose. 

Photography is an excellent means of getting contrast, but if it is 
not an integral part of the system, I don’t believe it answers the 
purpose. 

The use of a limited number of fonts and inability to handle italic 
are limitations. For example, Russian mathematical texts sometimes 
have whole pages of nothing but italic. Italic must be handled. 
Small tidbits of equations that occur right in the text, such as 
“EL =m” and so on, must be handled, or else the whole machine will 
operate by fits and starts. 

There is also the matter of compatibility. There are a number 
of machine translation research programs with none of which I 
believe the Baird- Atomic system to be compatible. 

It is designed as an integrated system, therefore, with itself, it is 
compatible, but all of these other programs, including those sponsored 
by other agencies, require a reader which has different characteristics. 
This we propose to make. We have gone to the CIA committee for 
example, and asked them to gather the specifications and needs of 
every other agency so that we can try to incorporate them. 

The Cuairman. I understand that Baird-Atomic will handle about 
100 fonts; that is, 100 different sizes of type? 

Colonel Ketioce. The statement in their letter, sir, is 12 fonts, 
which my letter from the Air Force states have not yet been selected. 
Each font may include 100 characters, sir. 

The CHarrman. Mr. King. 

Mr. Kine. Very little, Mr. Chairman, perhaps one or two questions. 

Mrs. Rhodes’ system of which you speak, General Ely, can it handle 
idiomatic constructions, or just a matter of translating word for word 
and ignoring the more complex idiomatic combination of words? 

General Evy. This is why we feel that it is a good one; it can han- 
dle idiomatic constructions by its method of taking a word or a group 
of words, shaking it, and trying again and again to fit this into the 
proper sequence, and thus coming out with the thought as expressed 
by the writer. 

Mr. Kine. Sometimes an idiomatic construction has no basis in 
logic at all. It is rooted in history. In other words, the people 
must work out a combination of words that have a certain meaning 
to them that are unrelated to the meaning of the individual words. 
For example, we say “get on the ball,” or “he’s a screwball,” terms 
like that are rooted in our knowledge of baseball, but they have no 
relationship to the word “ball,” it is just that we understand the 
patterns of the game of baseball. But how would a machine taking 
individual words be able to come up with a total meaning? 

General Exy. I will ask Colonel Kellogg to answer this in detail, 


since he has worked with Mrs. Rhodes, but I am sure that there will 
‘be phrases in the Russian language, as there are in our own, that mean 
something only to the author at times, and that if they haven’t pro- 
-gramed the idiom into the machine, the machine may not work it out. 

The Cuarrman. If it is “screwball,” it is still “screwball” in the 
machine, is it not? 
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General E,y. That is right. 

Mr. Kine. Let me ask one more question, then maybe you would 
like to expand on all of them. Isn’t it possible, though, in fact prob- 
able, that you are also going to have a significant margin of error be- 
cause of these very factors that I mentioned, and if that be true, even 
if the margin of error is only 2 percent, still that could be important 
enough to jerk out the entire foundation of the whole article. Maybe 
the 2 percent that you missed is the most important 2 percent, and the 
key to everything else. As long as there is a significant factor of 
error, aren’t you always reading that with some apprehension and 
reservations in your own mind, and thus might that not really destroy 
the scientific value of the translation ¢ 

General Exy. I am sure that is correct. As you know, even the 
best hand translation is not 100 percent accurate. So to expect that 
a machine would be 100 percent accurate is beyond hope. However, 
we do feel that we can, with Mrs. Rhodes’ system, get to a 90 to 95 
percent accuracy. Say the document being translated is in the hands 
of a physicist who has a little knowledge of the Russian language. 
We feel that he would be able to get the “meat” that he really needs 
from the document as translated. 

Mr. Kine. This type of work would be best in scientific areas, is 
that not correct, and where you move into literature whose impact 
is mainly emotional, literary, then you are getting a little far afield, 
aren’t you? 

General Evy. I would expect you could do better in factual scien- 
tific writing than you could in emotional or any other type of writing; 
es. 

: Mr. Kine. That is all I have. 

The Cuarrman. I hope the military does specialize not in poetry 
but in science. 

General Ey. We do, sir. We have no research in the poetic field. 
The CuarrMan. Mr. Bass. 

Mr. Bass. No questions. 

The Cuarrman. I have one or two more questions to ask you, 
General. 

The Army is spending—has spent $59,000 over a period of years, 
haven't they, on this project? You have $100,000 for the current 
year ¢ 

" General Ex1y. We have $100,000 for the current year in Professor 
Lehman’s German-English, and we have $70,000 for the current year 
on Mrs. Ida Rhodes’ project, sir. 

The Cuatrman. You really have $170,000 for the current year. 
Now, next year what are you asking for ? 

General Eiy. Next year essentially the same level of funding. The 
next year we are asking for $125,000 for Mrs. Rhodes’ project, and 
approximately $100,000 for Dr. Lehman. 

The CuarrMan. That makes a total of $225,000. 

General Ery. That is correct. 

The Cuatrman. Heretofore you have spent, as I understand, in 
prior years about $59,000. 

General Ery. We spent $59,000 on Dr. Lehman’s work in fiscal year 
1959, and a total of $50,000 on Mrs. Rhodes’ work in 1958 and 1959. 
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The CuarrmMan. If you want to verify it, you can change it in the 
record. 

Since your expenditures are going up, you certainly must feel that 
you are making progress in approaching the time when you strike 
paydirt; to use the vernacular of the oil and gas fields. You will 
strike paydirt on this soon, is that right ¢ 

General Evy. In the case of Dr. Lehman’s project, as I indicated in 
my testimony, this is a long-range project, this German-to-English 
and English-to-German translation—so we don’t expect to strike 
paydirt on that soon. But in Mrs. Rhodes’ project, we do expect to 
strike paydirt, as I say, within 2 to 3 years. And the reason for this 
increase is that she is now able to start programing and putting a 
program in the computer. As you well know, computer time is ex- 
pensive, and one of the more important facets of the increase is the 
payment for computer time. 

The CuarrmMan. Do you have any specimens that you tried to run 
off that show the weakness of the translations? Do you have any- 
thing of that sort you could show the committee ¢ 

General Ey. We have no specimen that has been programed into 
and taken from a machine, but we do have—Colonel Kellogg can de- 
scribe it a lot better than I can—we do have the “thinking” process 
that she has gone through that has been submitted to expert criticism, 
and apparently she is on a very sound and good approach to the 
problem. 

The CuatrMan. But you have not reduced it to actual operation of 
translation 

Colonel Ketioae. It has been placed on paper, and in Mrs. Rhodes’ 
mind, which works like a computer. She is now preparing the pro- 
gram and trying that out piece by piece, but it has not been put 
through to the point where we could offer anything to show, except a 
sample of what we think it will look like. This we do have. This is 
not a fair comparison with what others have actually put through a 
machine, for their sakes. 

The CuarrMan. Should one agency or suborganization be assigned 
the national job of translating all documents at one center? That, 
of course, embodies the idea of a centralized agency. I think the 
National Science Foundation is trying to keep everybody working 
together in cooperation, but would it be better if you had one agency ¢ 

General Exy. I think not; no, sir. I know we will lick this business 
of print reading and machine translation. I don’t feel that it should 
be concentrated, because it involves the use of general-purpose com- 
puters—all types of computers, which will be available in many 
places. 

The CuarrmMan. Even if the agency were the Army, you wouldn’t 
agree to that ? 

General Eiy. No, sir, I certainly would not. 

The CHatrman. Do you think that each department of Govern- 
ment should do its own translation ¢ 

General Exy. This may have some merit. The main thing we 
should have, of course, is such coordination and communication be- 
tween agencies that we know that we are not translating one docu- 
ment in several different places, but the chances are that if we localize 
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the various types of documents to an appropriate agency, then this 
duplication will not exist. I agree that it should be well coordinated, 

The Cuarman. I can see where there would be a disadvantage in 
having the same document translated in two or three different agencies, 
because they might look at it and might not recognize the same 
document. 

General Evy. For a while this may be true, yes, sir. 

The Cuairman. Mr. Fulton. Mr, Fulton has a question, 

Mr. Fuuron. The question is: How do you go about it? Do you 
try to do the whole language or do you start with a system of signals, 
such as in a particular field, in the Army, and develop from there! 
For example, do you just start out with a purpose, or do you just 
wander out al) over the field of the language and try to pick off all 
the facets and barbs of a particular word or a | gr You see, | 
think it might be better to have a purpose, and a limited purpose, 
and then proceed from some very concrete word blocks and build a 
pyramid out of it. 

General Evy, [ believe this is essentially what the most hopeful 
project that we are working on, Mrs. Rhodes’ project, intends to do. 
She is working now primarily in mathematics. She will undoubtedly 
program a vocabulary in mathematics, and her system is such that 
with some addition to this vocabulary at a later date—and it will take 
additions for physics, it will take an addition for chemistry—but her 
system gets at the shaking out of the words in this particular field go 
that it can be programed into a machine; we are not just going to 
“translate the Russian language.” She is concentrating in an area. 

Mr. Fuiron. You see, when you come to signals, communications 
in the services, you don’t have idiomatic meetings or colloquial mean- 
ings, and you don’t get into very difficult structures. Now, possibly 
you are not only coming up with what the language has been, but on 
this means of translation you could come up with a type of basic 
English in apposition to a basic Russian in a particular field. So 
you are more or less going to have the people who use it use your 
forms, rather than picking off what is already existing. Did that ever 
strike you? 

General Exy. I would like for Colonel Kellogg, who is close to this 
effort of Mrs. Rhodes, to answer that question. 

Colonel Keixoce. I am not sure I understand your question, sir; 
is it a matter of pinning 

Mr. Futron. If I were in the Signal Corps, I would adopt the Signal 
Corps system of signaling. Naturally, if I were a naval officer I 
would have to adopt flags and know what they mean on the bridge. 
Maybe if you would, on a practical system, start with these blocks 
of signals, and then expand from there for a practical use. We have 
had some scientists here, and when they got through I think Mr. King 
pointed out there are 73 variations of the word “run.” If you take 
a variation of the word “run” and put it against something like 
“man,” or the way it is used in many contexts, vou can get an astro- 
nomical number of possibilities out of taking just the two or three. 

Colonel Ketioae. Sir, in going from English to Russian, I agree, 
I don’t know of any way to do it. Professor Lehman hopes he 
ean. The Russians are working on this with 10 times as many people 
as we are; they are having a bad time. English doesn’t have a lot 
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of “handles” to grab at. In Russian this problem does not exist. 
There are a number of ambiguities, it is true. But for the word 
“run,” instead of having just one English word to look at, you would 
have a dozen possible noun forms, and over a hundred verb forms 
to look at. In the Russian there would be relatively few ambiguities. 

In German, the ambiguity gets worse. That is why German is 
easier to learn but harder to program into a machine. 

Mr. Futron. My point is this, instead of just going generally into 
the whole field and trying to cover it as between two languages with 
its multitude of possibilities and probabilities, why don’t you get a 
basic system of communication that you and the Russians can pretty 
well start on, and that has their position and then your position in it, 
and move from there on certain agreed approaches. You see what 
you are into, you are taking one end of the stick and nobody has the 
other end, and they have the other stick and you don’t have the other 
end of their stick. 

Colone) Kerxoee. I agree, this is a bad problem. If we could agree 
on two siple things we would both be happy. One is a uniform 
transliteration system, just for changing people’s names from one al- 
ynabet into the other. We have a system by agreement within the 
Western World; the Russians use a different one. _ ; 

Secondly, if we could agree to have them print a )ittle box instead of 
a period at the end of a sentence, and a comma that is a little more 
definite than the kind we have now, and if we traded them journa) for 
journal, this would save us trouble—but we can’t get agreement, The 
Russians have started to put English abstracts at the ends of some of 
their articles, and some articles I translate by hand have such ab- 
stracts. This we have not done, but this matter is outside the Army’s 
scope. 

Mr. Fuuron. Has anybody been trying to get this cooperation and 
correlation with the Russians? 

Colonel Ketioce. Yes, sir, the National Science Foundation, to my 
knowledge, very definitely is. 

Mr. Furron. If I had any comment it would be, from the ones I 
have been hearing so far, that you might be taking too big a bite 
of the cake. If you try to cover the field of knowledge, as has been 
said here, of a 7-year-old in a language, you would get something that 
is much different than a main projection where you are going to build 
up an instrument for use in communications. 

Colonel Kriioce. Mrs. Rhodes’ approach, sir, is to take the nastiest, 
hardest sentence she can find, without worrying about the vocabulary, 
and see if she can feed that sentence into a machine and have it come 
out in the preper order. While doing this work, we are not allowed to 
see the meanings of the words; we are just shown what the machine 
can shake out of them—a noun, this case, this gender, this mood, and 
so on. 

Now, knowing all these facts, you see if you can devise a program 
which will rearrange all these into the proper order. Then go back to 
your dictionary and get into the semantic problem of meanings to fill 
in. Dictionary comes last in this approach, not first. 

Mr. Fuvron. When you go to ae Sat and try to translate your ideas 
into an idiom, there is a written language. They don’t give you the 
nastiest. sentence they can think of, first. They start with simple 
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blocks and build you up from there, and they do it fora purpose. You 
are in a spelling class; you are in an English composition class. And 
my question really is as to the basic sommes of the approach. 
Maybe you would do better if you would have a system of practical 
2 haan with small] blocks that is for a particular practical purpose, 

olonel Kerxioce. I think we are past that point, sir. This has 
been done for years. The first block was word by word. The next 
was prepositional phrase. There has been other research in this field 
for a long time. The reason we are supporting Mrs. Rhodes is be- 
cause we think she has it licked for any length sentence you care to 
write, right now. We can’t prove it yet until this program comes 
out. I won’t ask anyone to believe me until we see it go in one end 
and out the other end of the machine. But she is convinced, I am 
convinced, and I believe the Army is convinced, that she has this prob- 
lem solved. The semantic one, no; we do not have that one solved. 
But to be able to take any sentence, put it through and have it come out 
syntactically correct—this I believe we can do at better than 90 per- 
cent. But the semantics are what knock us way, way down. That is 
what is going to hurt. 

Mr. Furron. I am sure Mr. King and I will be more than pleased 
to find out that that is successful, because if you take a 73-variation 
word, combine it with a 50 and a 30 and a 20 and a 10, and maybe 
a 5- or 6-word sentence, you’ve really got something to figure out. 

Colonel Kettoce. The machine can do this, sir. It has the time and 
the speed to handle it. It can resolve these difficulties. Unless the 
author made a grammatical error—then we are in trouble—or the 
earyd made an error. Here, unfortunately, in the work I do by 

and translation I find one error about every page and a half—and 
serious errors—that I have to draw on my knowledge of physics to 
correct. But the machine can’t hope to solve that. Here is another 
rough part: Transposed ordinate and abscissa in a graph; the ma- 
chine can’t handle that. 

Mr. Futron. From the service point of view: Is this program prac- 
tical enough to warrant further expenditures so that we aren’t going 
down a scientific dead end, for example ? 

General Ey. I don’t believe that more money would help materially 
in Mrs. Rhodes’ program which we are raising from $70,000 this 
year to $125,000 next year. Apparently, much of this is in her own 
brain, so it is not something that a large expenditure of money will 
bring out. It is coming out of her computer-type brain. As we 
mentioned in the latter part of my presentation, we do have need for a 
print reader that will make this a worthwhile business after we 
solve the machine translation, and we are seeking more money now to 
go to work on a print reader that will really do the job and do it fast. 

The Cuarrman. Well, I think you testified that you have hopes 
that you will develop within the 2- or 3-year period a translation aid 
program. 

General Ery. Machine-aided translation within the next 2 to 3 
years; yes, sir. 

The Cuatrman. Let me ask you this: Are you familiar with the 
empirical approach used by several agencies ? 

General Exy. I have heard of the Georgetown project, but I can’t 
say I am familiar with it. 
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The CuarrmMan. You can’t say whether you agree or disagree with 
that approach, since you are not familiar with it? 

General Exy. Colonel Kellogg has some definite theories, if you 
would like to hear them. 

The Cuarrman. All right, Colonel, will you give us your theories 
on that ? 

Colonel Kretioce. On purely technical grounds, there are a number 
of major differences of opinion in this field. We all try to cooperate, 
but if we didn’t disagree, there wouldn’t be any technical advance. 
I believe that the Georgetown product at this moment is the most 
advanced of any that exists that could be put into production if we 
had to have it this minute. However, I feel it is about 55 percent 

ood. Professor Yngve said “50 percent,” I will say “55.” I don’t 
eel that is good enough. 

I believe another project has something in the nature of 40 per- 
cent good. 

The CuatrMan. You mean 55 percent good, it will turn out 55 
percent. words properly translated, and the other 45 

Colonel Keto. No, sir, sentences, Mr. Chairman. 

The Cuairrman. About 55 percent of the sentences will be satisfac- 
tory, and 45 will not be; and that is the best we can do now? 

Colonel Ketioea. Yes, sir. This is the best product available to- 
day, in my opinion. 

The Cuatrrman. That would help us? 

Colonel Keiioce. If we had to have it immediately, it would be 
helpful; but there are other technical reasons why I don’t think they 
should go into production now. ‘This is the goal we are shooting for, 
and we think we can surpass. We need more accuracy. 

The Cuarmman. The services of how many people will be elimi- 
nated ? 

Colonel Kriioce. The number of translators that you can replace, 
sir, skilled translators, doing 600 words an hour for 5 hours a day, 
is between 100 and 200. 

The Cnatrman. It would take the place of one to two hundred 
skilled translators ? 

Colonel Yes, sir. 

The Every day? 

Colonel Ketioae. Yes, sir. 

The CHatrman. Of course, the machine doesn’t tire; the human 
translator does tire. In that event, are you able to say what these 
machines will cost ? 

Colonel Ketioae. The rental of the computer, sir, is the catch. 

You don’t really buy them, usually. 

The Cratrman. If you develop the computer, won’t it be your 
computer? Or will you rent it from somebody ? 

Colonel Ketioee. It doesn’t matter, sir; I think this is dependent 
on other factors. We don’t care. 

General Exy. I think there is a misunderstanding here, if I may 
clarify it. This doesn’t require the development of a new computer. 

This requires the development of programs that you can program 
into existing computers. 

The Cuatrman. You've got to develop another brain for that com- 
puter to do the translation, isn’t that about it? 
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General Ey. It is programing into the computer, as we do now 
with various kinds of figures and data all the time. This is just 
another program for a computer. 

The CuarrMan. It is the same basic computer, but you will set up 
different categories in the brain of that computer ? 

General Evy. That is right. 

Mr. Futron. There is no mechanical change in the computer? 

General Exy. No. 

The Cuarrman. You lengthen the memory of the computer, don’t 
you? 

Geenral Ey. Yes; with the input. More rolls of tape in the com- 
puter is probably the simplest way to describe it. 

The Cuarrman. In translating Russian, how many words would 
the computer have to be familiar with in order to make an effective 
translation ? 

General Evy. Colonel Kellogg says 50,000 Russian words. 

The Cuarmman. Then in addition to that, if it is scientific mate- 
rial, you will have to be familiar, of course, with numerals and scien- 
tific equations; is that correct ? 

Colonel Kriioce. The numerals are the same, sir; the equations 
have minor differences. 

The Cuarrman. The numerals would come in the equation class, 
but in addition to the 50,000 words, the computer would also have to 
be familiar with scientific and mathematical equations. 

Colonel Kreiioce. Sir, some of this can be bypassed. The equations, 
luckily, are much the same in Russian. Some of the superscripts and 
subscripts change, which gives us a little problem. But the graphs 
come through very nicely, except for the change of the caption and 
the labeling of the ordinate and abscissa. This is a rough problem 
because they are printed up and down. 

The CuarrmMan. They use the same numerals, of course; they use 
the same scientific equations. 

Colonel Krtxioce. Yes, sir. In fact, they use many of our letters. 
For “energy” they happen to use a capital English “E,” not a Russian 
“K.”. And for the chemical symbols, they use our symbols, or the 
Latin symbols, if you will. 

The Cuatmrman. That wouldn’t worry your machine; would it? 

Colonel Ketxioae. No; it shouldn’t. There are small captions to 
these curves and graphs, which are written vertically rather than 
across the bottom. This is a bad problem when you want to translate 
those. They use a comma instead of a decimal point. Some human 
translators leave that alone. I prefer to change mine for the people 
I do my business with. 

The Cuairrman. You say the Rusisans won't put in boxes at the end 
of the sentences; if it is shorthand you put an “X” there. Your 
machine would prefer a box of some sort ? 

Colonel Kreixioce. All machines would, sir, because how does the 
machine easily know, without a special program, whether it has 
reached the end of a sentence when it sees “See Fig. 1,” with a period 
after the “Fig.”? It must be sure it isn’t taking a decimal point for 
the end of the sentence. The machine has to know when it reaches the 
end of the sentence. We can program this, but it would save us a lot 
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of effort if the Russians and we would substitute a little box for the 
end-of-sentence period. 

The Cuairman. I doubt if you can get Mr. Khrushchev to adopt 
something like that, even as simple as it is, to work out for computers. 

Colonel Ke.iocc. For propaganda purposes, they might, sir. For 
propaganda purposes they are now printing nice little English ab- 
stracts in the articles which I translate by hand. We are not doing 
anything in return. We could share one journal back, say “Physical 
Review,” or something like it. 

This would help both sides, but now they get the propaganda value 
of doing it first. 

The Cuairman. What propaganda value do they get out of those 
notes ? 

Colonel Keiioce. The propaganda value of putting it in our lan- 
guage, I think has some effect on our scientists. Why aren’t we doing 
thesame? Are we ashamed, or don’t we have anyone who can do this 
in Russian ? 

The Cuatrman. In other words, they are doing a little bragging at 
the bottom of the article ? 

Colonel Ketioge. Yes, sir. They put their abstract of Russian in 
the front, then the article in Russian, then a neat little English ab- 
stract. This is in a journal printed in Russia. 

The Cuatrman. So it facilitates your picking it up? 

Colonel Ketioae. Yes; this is a help. 

The CuatrMan. Any further questions ? 

Mr. Moeller hasn’t asked any questions. 

Mr. Mortier. Are the translations, in your opinion, always ac- 
curate ? 

Colonel No, sir. 

Mr. Mortirr. They are not intended to deceive you, are they ? 

Colonel Ketioce. Oh, you meant the abstracts? 

Mr. Moetter. Yes. 

Colonel Krxxoae. No, sir; the few I have seen have been fairly 
accurate. In one case I was amused: the translator toned down an 
extravagant statement of the author’s. So I might say they were not 
intended to deceive. 

Mr. Moetier. I can understand, well, with respect to scientific data, 
this rather pooled translation would give you just what you want, 
perhaps. However, with respect to intelligence, knowing there are 
some very fine shades of meaning in certain words, one has to know 
that language to fully understand the intent of the author. 

How are we ever going to accomplish this with respect to intelli- 
gence? We certainly could be going in the wrong direction, could 
we not ? 

Colonel Krtxocc. For this purpose all we can do is the following. 
We can automatically abstract the translations, the whole batch, and 
then see what is worth looking at. When it comes to political shades 
of meaning and nuances, this is what we should save our skilled trans- 
lators for, and not put them on post-editing of scientific output. We 
can use nonlinguists to post-edit a machine product. We need our 
skilled translators for shades of meaning and nuances of that kind. 

The Cuatrman. Mr. Fulton. 


i: 
— 
3 
4 
‘ 
fy 
3 
/ 


66 RESEARCH ON MECHANICAL TRANSLATION 


Mr. Fuuron. I am surprised that you have any trouble between a 

period and a decimal point. Don’t you use the usual computer deci- 
mal point method, giving an extra numeral, and then doing the count- 
ing over to the place where it is to go? 
Colonel Sometimes, yes, sir, sometimes no. The Russians 
sometimes have a queer habit of putting a period in front of a refer- 
ence coming up, which may happen to come after a numeral. We are 
reading their printed pages. These periods do occur nastily, because 
they don’t use them for a decimal point, they use a comma for a deci- 
mal point, so it doesn’t bother them. ; 

Mr. Futron. Do they use our system of numbering for decimal 

oints 

Colonel Ketioce. Yes, sir, but they use a comma where we use a 
period. In fact, in vector analysis, they use a dot where we use a cross, 
which gets very confusing. This we have to change over. 

The Cuarrman. Couldn’t your input machine control that, couldn’t 
they just eliminate that period ? 

Colonel Kerxoce. It can, sir, but it would save making a more com- 
lex machine if we had a box. Georgetown, for example, in their 
and keypunch input, uses a box for that very reason. It just makes 

it that much simpler to handle from then on. 

The Cuarrman. Couldn’t the machine establish the box? 

Colonel It can. 

The Cuarrman. I doubt whether you can hope for the Russians to 
put in a box there to help us. 

Colonel Ketioce. Well 

General Evy. The input machine we are seeking is an input machine 
that would take the printed page as printed, and put it into the com- 
puter without any repunching or anything else. 

The Crarrman. In other words, automation ? 

General Ery. It would scan it electronically. 

The Cuatrman. Mr. Fulton. 

Mr. Fuxron. The question comes up as to the method of getting the 
Russians’ cooperation. Has there been any approach to our U.S. 
delegation in United Nations in making any representations for work- 
ing out some sort of a U.N. program ? 

General Ery. We know of none; no, sir. 

Mr. Furron. I was on the delegation last year, and I certainly 
would have brought up something of this type. I am sure the rest 
of the delegation would be very interested in trying to work out a 
method. They will get small areas of agreement on just such technical 
things as this, in order to expand the interchange we are now getting. 

The Cuarrman. General, we want to thank you very much, you and 
the colonel, for coming here, giving us a very interesting statement. 
I am pleased, personally, that your work shows substantial progress 
and you now can see perhaps at a reasonably early date that you are 
going to strike paydirt, that this project will pay off. 

General Yes, sir. 

The Cuarmman. At this time, as I informed the committee, we have 
reserved the last hour this morning for the next witness, who is Dr. 
Edward W. Cannon, Chief of the National Bureau of Standards, 
Applied Mathematics Division. 
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So, General, we want to thank both of you for being here, and I 
think we can release you now. If you want to stay we would be glad 
to have you. If you don’t, we will understand your leaving. 

General Exy. It was our pleasure. 

Colonel Ketioge. Thank you, sir. 


STATEMENT OF DR. EDWARD W. CANNON,’ CHIEF, APPLIED MATH- 
EMATICS DIVISION, NATIONAL BUREAU OF STANDARDS; ACCOM- 
PANIED BY KENNETH F. McCLURE, ASSISTANT GENERAL COUN- 
SEL, DEPARTMENT OF COMMERCE 


The CHatrMAN. Dr. Cannon, you are accompanied by Mr. Mc- 
Clure ? 

Dr. CaNNon. Yes, sir; on my left. 

The Cuatrrman. Mr. McClure, what is your position with the De- 

artment of Commerce? 

Mr. McCuvure. I am an Assistant General Counsel. 

The CHARMAN. That is for the purposes of the record ? 

Mr. McCuure. Yes. 

The CuarrMANn. Do you have a statement, Doctor ? 

Dr. Cannon. Yes, sir. 

The CuarrmMan. Will you proceed with the statement, sir, we will 
be glad to hear it. 

Dr. Cannon. Mr. Chairman and members of the committee, it is 
an honor to have the opportunity to appear before the Committee 
on Science and Astronautics. I am glad that you are able to give 
close attention to the national effort to apply electronic digital com- 

uters to the translation of natural languages. I consider this en- 

eavor a major avenue for critically needed extensive broadening of 
knowledge in our country of scientific and technological develop- 
ments reported in foreign literature. 

The National Bureau of Standards turned its attention to mechani- 
cal language translation by use of electronic computer upon the re- 
quest of the U.S. Army. Our project on mechanical translation has 
been conducted as a service for and under the support of the Army. 


®Dr. Edward W. Cannon is a native of Cannon, Del. He received his bachelor of 
science and master of science degrees in electrical engineering from the University of 
Delaware in 1928 and 1931, respectively, and obtained his doctor of philosophy degree from 
Johns Hopkins University in 1935. e then taught at the University of Delaware as a 
member of its mathematics department. 

From 1942 to 1946 Dr. Cannon was on active duty in the U.S. Naval Reserve. During 
part of this time he served in the Bureau of Ships, Washington, D.C., as statistical quality 
control officer and Executive Officer of its Research and Standards Branch. 

Dr. Cannon joined the National Bureau of Standards in July 1946, as an assistant to 
the Director to participate in the Bureau’s electronic computing machine development 
program. With the establishment of the Applied Mathematies Division in 1947, he was 
made its Assistant Chief and also headed its section responsible for the logical design of 
electronic digital computers, the Machine Development Laboratory. During 1951-52 
Dr, Cannon was stationed at the Institute for Numerical Analysis at the University of 
California, where he played a central role in the institution of design modifications of the 
SWAC, an electronic digital computer developed by the National Bureau of Standards, 

In September 1952, Dr. Cannon was granted a leave of absence to serve as director of 
the logistics research project at George Washington University. He returned to the 
Bureau in September of 1954 to resnme his duties as Assistant Chief of the Applied 
Mathematics Division and as Chief of the Mathematical Physics Section. He became 
Chief of the Division in 1955. 

Dr. Cannon is a member of the American Physical Society, the American Mathematical 
Society, the Washington Philosophical Society, the Association for the Advancement of 
Science, the Washington Academy of Sciences, Sigma Xi and the honorary fraternities, 
Tau Beta Pi and Phi Kappa Phi. In addition, Dr. Cannon served for 5 years as a member 
of the editorial committee of the National Research Council Journal, mathematical tables 
and other aids to computation. 
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We began our work on MT in 1958, rather recently compared to the 
other groups you have invited to talk with you about this important 
endeavor. Our group is small, consisting of Mrs. Ida Rhodes, proj- 
ect. leader, and three assistants. The National Science Foundation 
has helped us by detailing Mr. Richard See to work with Mrs. Rhodes 
part time. Wedo havesome volunteer assistance. 

No special equipment is assigned to the project; however, it does 
have access to and has utilized the Bureau’s highspeed electronic 
computer. We are especially interested in the sommaiadions of scien- 
tific Russian. 

I regret that Mrs. Rhodes cannot be here today. She has been 
confined to a hospital bed for almost 2 weeks. As her supervisor, my 
main problem has always been keeping her from working too hard 
and too long. Both as a service to you gentlemen and as a tribute 
to her we shall do our very best to give you the information you desire. 
I am fortunate in having the assistance of Dr. Franz Alt, Assistant 
Chief of the Applied Mathematics Division of the Bureau, and Dr, 
Henry Birnbaum, an assistant to our Director, Dr. Astin. 

We have interpreted our task to consist of making it possible for 
the scientists in our country to keep abreast of the work being carried 
out by their colleagues abroad. This is not a task to be undertaken 
lightly. A mistranslation of a vital scientific fact can direct the 
reader’s mind onto a wrong track, which may swerve him far away 
from the path followed by the original investigator and lead to 
disastrous consequences. We must strain our efforts, therefore, to 
achieve a scrupulously correct, completely faithful rendering of the 
scientific results described by the original author. 

It is a fact, however, that even the human translator seldom achieves 
this desideratum. It is said that during World War II the Russians 
printed all their maps with numerous errors, so as to confuse the pos- 
sible invader. They need take no special precautions to confine the 
utility of their scientific publications; the translators in other lands 
will manage to do this for them. 

It is easy to account for this undesirable state of affairs when we 
examine the requirements for perfect scientific translation. 

The translator must have an excellent command of both source 
and target languages; he must be a devotee of the science treated 
in the source text, and, moreover, he must be thoroughly familiar 
with its special terminology in both languages involved. For example, 
if a translator from the English language renders the term “hydraulic 
ram” into the equivalent of “water goat” in the target language, the 
foreign reader is apt to be more than a bit confused. This example 
is no exaggeration of the sort of English renderings we encounter 
in the translations of foreign scientific articles. But these require- 
ments for an effective human translator of scientific material are sel- 
dom fulfilled in one and the same person, especially in the case of trans- 
lation from Russian to English. 

Considering the formidable difficulties which face the human trans- 
lator, we must exercise extreme care in attempting to use for this 
task a manmade device which possesses neither the senses, nor the 
brain, nor the lifetime of experience bestowed upon man. For ex- 
ample, the conscientious human translator frequently augments the 
information stored in his mind by the use of grammars, scientific 
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texts, dictionaries, and special glossaries. This implies that for 
automatic machine translation the vast totality of information, 
whether inherent to the human mind or stored in the accumulated 
literature, must somehow be fed into the electronic processor. Not 
only do we lack, at the present time, adequate devices for such a task, 
but the expenditure of time, labor, and money to store such colossal 
amounts of data will be prohibitive for many decades to come. We 
must therefore lower our sights and make a judicious choice of the data 
to be fed into a translation machine. But this choice depends both 
on the mechanical translation scheme which is being evolved and 
on the type of equipment which will eventually be used. The task 
of the NBS Mechanical Translation Group thus assumes a double 
aspect—to strive to achieve a workable mechanical translation scheme 
and, simultaneously, to keep abreast of and be prepared to utilize 
promptly engineering developments. 

I may describe the NBS approach to mechanical translation, briefly, 
in the following manner. It is based upon the fact that human speech, 
through frequent repetition and man’s striving to be understood by 
his fellows, is constrained within certain recognizable patterns. A1- 
though individual words may be extremely unruly, so that the num- 
ber of grammar exceptions sometimes exceeds the number of rules, 
the role each word plays in the sentence and the sequence in which 
the words occur are to a large degree systematized and therefore pre- 
dictable. This prediction is made possible by use of the general rules 
of grammar and, in addition, the peculiar properties and affinities 
of words. As an illustration of the general grammar, we recall that 
an active verb usually takes a direct object, and perhaps an indirect 
object as well, as in “I gave the book to John.” In the Russian lan- 
guage, with which we are concerned, the latter class of prediction 
rules—depending on the peculiar properties and affinities of words— 
is demonstrated for example by the fact that different prepositions 
may govern different cases, that nouns are very likely followed (but 
immediately or not at all) by other nouns in apposition or in the 
genitive case, and so on. 

Another and very interesting idiosyncracy of the Russian language 
is that in certain instances, after the numerals 2, 3, or 4, even if com- 
bined with others (as 52, 64, etc.) the noun is in the genitive singular, 
while after all other numerals the noun is in the genitive plural. To 
us this would be like saying 6 men but 62 man. These properties of 
Russian words, which are strange to us indeed, add to the difficulty of 
the human translator. They are employed quite effectively in our 
method, however, in the grammatical analysis which is essential in 
machine translation and translation by human beings alike. Indeed, 
this feature of the Russian language enables us to list for each entry 
in our glossary, along with the conventional information found in 
dictionaries, a set of predictions regarding the nature of the words 
which may follow the given word. Moreover, each prediction can 
be and is accompanied by an index indicating the probability of its 
fulfillment. 

In our approach at the NBS the machine first examines each word 
in the Russian text and establishes its grammatical interpretations 
and meanings. I take a moment here to emphasize as strongly as 
possible that very rarely does a source word possess a unique inter- 
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pretation, either as regards its grammatical structure or its particular 
shade of meaning. 

A simple example will suffice to illustrate this assertion. A Rus- 
sian translator confronted with the isolated English word “b-o-r-e” 
would be hard put to decide whether it is (1) a noun describing a 
feature of a gun; (2) a noun characterizing a certain type of human 
being; or perhaps (3) the past tense of the verb “to bear,” which in 
itself has various meanings. We say, “The wild beast bore down upon 
the explorer” or “The queen bore the king a royal heir” or “The 
martyr bore his cross with angelic patience.” In no other language, 
obviously, could we expect all of these connotations to be expressed 
also by a single word. I feel certain that one would expect similar 
ambiguity to exist when one goes the other way—from Russian to 
English. It does. 

To return to our process of machine translation, the machine ex- 
amines each Russian word to ascertain from its grammatica) form its 
meaning or meanings in English, and what other Russian words it 
leads us to expect. These expectations are pooled with others arising 
from the rules of conventional grammar and are compared with sub- 
sequent occurrences. Occurrences which do not match the existing 
predictions are stored, for further use, in what we call a hindsight 
pool, and are subsequently reconciled. 

The English equivalents found in the dictionary during the analysis 
I have described are synthesized into an English sentence, giving an 
output which is in pidgin Eng)ish and very crude, but which has the 
correct grammatical construction. 

As far as I know, the foresight or predictive technique, which is 
now being called predictive analysis, together with the use of hind- 
sight pools, originated with Mrs. Rhodes, leader of the NBS Mechani- 
cal Translation Group. I believe that it is a very powerful and a 
promising technique for the mechanical translation of languages, and 
this opinion, I am pleased to state, appears to be shared by many 
students of languages and members of mechanical translation groups 
both in this country and abroad. As you no doubt were informed 
yesterday, for example, Dr. Oettinger’s group at Harvard University 
is now concentrating major effort in exploring this technique. 

Other unique features of Mrs. Rhodes’ approach to MT are the 
use of repeated passes (successive approximation techniques) to trans- 
late the more difficult sentences, and the provision, for each translated 
English sentence, of indicators of the measure of reliability the reader 
may attach to it. 

At the present time, her scheme is able to cope with the syntactical 
aspects of the mechanical translation problem. In other words, the 

redictions enable the machine to pinpoint the unique grammatical 
interpretation of a source word, so that it assumes its proper role 
and sequence in the target sentence. We wish we could say as much 
for the semantic, or multiple-meaning aspect. This is a far more 
difficult task, as it involves the examination of the context of the 
word under consideration, and the derivation of the proper infer- 
ence from the association of ideas revealed by the surrounding words. 
Remember the example, the English word “bore.” Until this formi- 
dable problem is solved (and we are not yet certain that it can be) 
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we shall be forced to print several meanings for a single-source word 
and, on occasion, several versions of the same sentence. 

For this reason, and many others which we shall not enumerate here, 
our final translation will be quite inelegant—as we have said, even 
in pidgin English. We do feel that the crude translation yielded 
by our method will give the reader a correct image of the meaning in 
the foreign text. Consequently we are now concentrating on com- 

leting and testing thoroughly a set of computer instructions embody- 
ing our techniques. 

Ithank you gentlemen for your kind attention. 

The Cuairman. Thank you very much, Dr. Cannon. That is a 
very informative statement. 

You mentioned in your statement that several groups preceded 
the National Bureau of Standards in research on mechanical trans- 
lation. Were you concerned, when the National Bureau of Standards 
went into this program, that there was duplication ¢ 

Dr. Cannon. We certainly did not want to duplicate other ef- 
forts, sir, and we were concerned over that possibility. Therefore, 
we suggested to our potential sponsor—and found hearty agreement 
there—that a survey of existing work be conducted to ascertain the 
nature of the programs underway in more detail than they had at 
their hands, and to determine whether there was room for another 
project. 

: This survey was conducted, and the recommendation reached by 
those conducting it was that it would be advisable for us to enter the 
field, 

The Cnairman. What does your program have that the other 
programs do not have? Where was there a deficiency in the other 
programs ? 

Dr. Cannon. I did not fee) that there was a deficiency in the other 
programs. It is hard for one, at least for me, to decide just what 
degree of the following of what might appear to be parallel paths 
is productive in conducting research. It was not a case of feeling 
that there was a deficiency in the programs underway, but rather 
a case of our feeling that we did have an approach which was suf- 
ficiently different from those under investigation by the remaining 
projects, and which offered sufficient promise to make it worthwhile 
toenter the field. 

The Cuatrman. Do you still feel that this additional program 
by the Bureau of Standards is justified ? 

Dr. Cannon. Yes, sir; I do. 

Of course, 1 can only comment from my perspective, but I do feel 
that we have made some significant and worthwhile contributions to 
this field of endeavor. J do think the decision that we enter the field 
has been proven sound. 

The Cratrman. Well, now, let me ask you this question. It has 
been said here that a word-by-word translation followed by a hand 
translation which would perfect the machine translation would prob- 
ably be satisfactory, at least for scientific and technical material. 

Is there a need for a more sophisticated method like yours? 

Dr. Cannon. I believe very definitely that there is, sir. One way 
of expressing my present opinion on the word-for-word translation 
followed by what I would call smoothing, that is, such translation 
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followed by a treatment by a person who is certainly not a translator 
but who makes an attempt to smooth the language to make it more 
palatable and more meaningful to the reader—my present opinion 
of that could be summed up in this statement: that there is nothing 
wrong with it that could not be cured by a good, efficient, human 
translator. I think that the requirements for effective utilization of 
such a procedure, the preservation of meaning from the source to the 
target language, on what we call the posteditors, are more stringent— 
at least as stringent—than those on the human translator. 

I believe it would be difficult to find personnel to do the job, and 
I doubt that the output would be as fast—well, I think the output, 
the rate of translation, if the posteditors conscientiously should strive 
for accuracy, would be no greater than that of the human translator. 

The Cuatrman. How many machine runs have you made, your- 
selves ? 

Dr. Cannon. Our method, I should say, we think of as consisting 
of two parts: the part which we call the glossary or dictionary lookup 
part, and the part that has to do with the syntactical analysis and 
synthesis of the target sentence. 

Now, on part 1, the glossary lookup part, which incidentally would 
yield a word-for-word output if that were considered our complete 
method, on that part we have made some machine runs. At the mo- 
ment I don’t know how many, but we have made runs to test the 
method. On the second part, since it is not completed; we have not 
made machine runs. 

The Cuairman. Tell me, have you completed your glossary for the 
machine? 

Dr. Cannon. We are well underway on the completion of a glossary 
which will enable us to test our method in mathematics. We are 
restricting the field to that of mathematics at the beginning. This 
will be a very small glossary compared to the 50,000 words mentioned 
in preceding testimony. 

The Cuarrman. About how large will it be ? 

Dr. Cannon. It will consist of about 2,500 words. 

The Cuatrrman. You have completed it ? 

Dr. Cannon. It is not completed, but is well on the way to com- 
pletion ; essentially it is completed. 

The Cuatrman. When in your opinion will the National Bureau 
of Standards mechanical translation scheme be completed ? 

Dr. Cannon. I would like to say, sir, that we are talking about syn- 
tactical analysis, not the solution of the semantic problem, which is 
our method as it is at present. This syntactical analysis we feel con- 
fidently will be completed in 2 years. 

The Cramman. At that time will you have a product which will 
be usable ? 

Dr. Cannon. We think, as far as the syntactical analysis is con- 
cerned, the error rate will be well within the 10 percent that has been 
mentioned. 

The Cuatrman. Then you take that product and turn it over toa 
hand translator who will go over it and add refinement to it? 

Dr. Cannon. The feature of our method, sir, which we consider 

uite important, is that it includes a signal to the posteditor that 
the machine had difficulty. Postediting to us would mean scanning 


t 
£ 
I 
a 
] 
a 
| 


RESEARCH ON MECHANICAL TRANSLATION 73 


the result, looking for such signals, and probably in many instances 
going back to the original text in those cases. We think, with that 
modification by a posteditor, our product would be valuable to a 
person knowledgeable in the field, and that I believe is the most we 
can say, at this moment. 

The CuairMan. Now, before this committee we have had testimony 
showing the overabundance of publications coming out of Russia that 
have been translated. If the scientist can’t find time to keep abreast 
of the technical literature in his own tongue, it seems that he will be 
unable to read the volumes of material translated. Are you not then, 
in mechanical translation, merely going to fill library shelves with 
inactive and unused volumes? 

Dr. Cannon. That is the $64 question, but I believe the answer is 
“No.” Lam certain it is no. I can think of many fields; for example, 
metallurgy, extremely high temperature research, extremely low tem- 
perature research, pulse circuitry engineering—which is basic to radar, 
for example, and to the electronic computer, an application of which 
we are discussing—of many fields in uk I believe all active scien- 
tists, engineers, technologists will read everything they can get their 
hands on that is published. I think translations in those fields will be 
very valuable hed would be used. Moreover 

The Cuairman, Of course, some of them would gradually go out 
of date, because they would be superseded by more up-to-date trans- 
lations. But what about your cost on this, your cost compared to hand 
translations ¢ 

Dr. Cannon. My feeling is that we can only predict or give an 
estimate of cost. I can do only that, because while I am optimistic 
with respect to the utility of our method—Mrs. Rhodes’ method—it 
still is not completely tested. I think, however, that to compete on 
the basis of cost with the human translator, the machines more power- 
ful and bigger than those now available must be used. 

It is somewhat paradoxical, but the more expensive machine, as we 
are accustomed to measure expense by initial cost or hourly rental, 
is the one which will be the more economical in this application. 

The Cuatrman. The higher your initial investment, the more 
economical will be the product, is that it ? 

Dr. Cannon. Yes, sir; in this sense: that the reason for continued 
development, for one thing, is to lower the cost of the basic operations. 
For example, multiplication cost goes down with the larger machines. 
Machines are sufficiently flexible now to make more expensive ma- 
chines unattractive, unless they perhaps have larger memories and 
are cheaper to use. So considering such machines, and I will repeat, I 
am predicting—I would not want to be called to account for my accu- 
racy—but I think that the machine cost will be less than the cost of 
translation by a single human translator, based upon cost per word; 
that is, taking, say, an average cost of translation by hand at $12 per 
thousand words, or say about 1 cent a word, the machine cost will 
eventually, in my opinion, be less than that. 

The CHatrman. Aren’t you pretty conservative when you say 
eventually the cost will be less than that? Won’t it be greatly re- 
duced over the hand cost? If it isn’t, what is the justification of put- 
ting a couple of million dollars in one of these machines? 

Dr. Cannon. May I return, sir, to our motivation ¢ 
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_We are extremely interested in doing what we ‘can to help in the 
dissemination of information, of scientific and technical information 
I think that is our primary motivation. I think the machine, because 
of its speed of operation, 1f a successful method is devised and a per- 
fect machine is used, will be an extremely rapid translator, and that 
I think is our need, sir. It is not the cost, which of course one must 
always consider, which frightens us, but the fact that we simply do 
not have enough translators. And we are well known not to excel 
as linguists. So how rapidly we can interest young people in learn- 
ing foreign languages like Russian is another question. 

The CHarrman. Mr. Fulton. 

Mr. Fuuron. Could I say, unless you come in here as a witness with 
either homicidal or equal tendencies, you probably will be faced with 
economy. Anybody who doesn’t have some end result that is for the 
purpose of destruction or opposition to somebody or some nation, is 
‘usually asked, “Why are you doing it?” 

I think there is the basic scientific reason for your research. You 
are learning language that would help in our own communications, 
and likewise aid us In many aspects of our economy if we can work 
out these matters. For example, on handling the communications 
from a satellite, you might have a satellite with 4 channels with 
500 digits a second. You want to put them into package form so that 
they go in and come out in proper form so that the translation is 
good. 

The things that you find here about translating between two lan- 
guages and the substance of language will be helpful to us on these 
vast communications for every purpose in the future, won’t they? 

Dr. Cannon. I believe so, sir. 

Mr. Furron. Would you outline for us shortly what the various 
systems are, capsulize them; and then there has been comment here 
that there has been dispute among the various proponents of systems, 
and tell us what the disputes are. If you would like to do it later, 
put it in the record. But I would like to know what the various 
systems proposed are, who is doing them, and also what the disputes 
are among these proponents. 

Dr. Cannon. I shall attempt to collect this information for the 
record, and may I say something here which I think is really unneces- 
sary but I should have said for the record in the beginning: I am 
really not an expert. I am appearing for our expert, Mrs. Rhodes, 
who unfortunately couldn’t be here. The reason I mention this, for 
one thing, is that there could be some slight delay in our collecting 
this information in as complete form as you could require it. 

The Cuarrman. That may be, but we don’t want to hold up the 
report, you know. 

Mr. Fuuron. I mean, a capsulized short version, so anybody look- 
ing at the record can see what the main problems are. 

The CuarrMan. You could get that in in the next few days, couldn't 
you? 

Dr. Cannon. A week or 10 days. Would that be satisfactory ? 
The Cuarrman. All right, make it not longer than 10 days. 
Mr. Fuuron. You may ask the National Science Foundation to co- 


operate with you. 
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The Cuatrman, I think Dr. Cannon can doit. He is a little modest 
there, but I think he can do it all right. 

Is there anything further? 

Mr. Futron. That is all. 

(The information requested is as follows :) 


DIFFERENCES IN APPROACH TO MACHINE TRANSLATION 


When the idea of automatic translation emerged, at first only word-for-word 
translation was considered. The inadequacies of this plan were obvious. They 
manifested themselves principally in the facts that some source words had more 
than one target word associated with them, and that the word order in the target 
language often has to be different from that in the source language. Speaking 
of multiple-target words and of word order, however, is not the most useful ap- 
proach to classification of problems. The change in word order can be under- 
stood by analyzing the grammar of a sentence. Multiple-target words can be 
either different grammatical forms belonging to the same stem, or words having 
entirely different meanings. Thus, in the main, the problems encountered in 
translation are classified into syntactic and semantic problems. 

The first attack on semantic problems was outlined in the memorandum by 
Warren Weaver’ in 1949. Early ideas ran as follows: A word like “nucleus” 
might have one meaning in a context of physics, another meaning in a context of 
biology, etc. There are other words which occur only in contexts of physics, still 
others which occur in biology, and so on. In the dictionary, each word would be 
coded according to the fields in which it is used. If a word of multiple meaning 
is encountered, a number of neighboring words both before and after the ques- 
tionable one would be searched in order to get a “majority opinion” concerning 
the meaning to be selected. 

This idea of word classification by context does not dispose of the problem 
entirely. The choice among multiple meanings may be determined not by con- 
text but by certain successions of words. For instance, the same Russian word 
may have to be translated into English as “prove” in the sentence “Prove the 
theorem of Pythagoras” or as “show” in the sentence “Show me how to con- 
struct a regular pentagon.” Attempts are being made to establish large-scale 
statistical information about pairs of words occuring together. Another ap- 
proach has been to attempt a more refined classification of word meanings, 
resulting in something like an oversized Thesaurus. Still other investigators 
maintain that the magnitude of the semantic problem can be greatly reduced by 
attempting whenever possible to find neutral translations which will cover as 
many meanings of the source word as possible. 

The three approaches to the semantic problem which we have just outlined 
might be named the statistical, systematic, and empirical approach. The same 
three methods of approach can be distinguished in dealing with the syntactic 
problem. 

Here, the statistical approach consists in searching through large amounts of 
source texts and enumerating the frequency of certain word sequences. For 
instance, how often does an adjective precede a noun, and how often does it 
follow it? The systematic approach attempts to set up a system of rules—in 
other words, a machine program—which analyzes the syntactic structure of 
each source sentence. That is to say, it identifies subject, predicate, direct 
object, ete., of each sentence or clause. This frequently has to be preceded by 
a grammatical analysis of each word, just as conventional grammar is divided 
into morphology and syntax, the former dealing with the inflectional forms of 
each word, the latter with the function of each word in the sentence. Finally, 
the empirical approach starts by selecting a few very simple rules for translation, 
tries them on a body of text and notices where they fail, corrects the rules or 
introduces new ones to cope with the observed failures, tries the revised rules 
on a larger body of text, and so forth. 

From a slightly different viewpoint we may distinguish between the use of 
conventional grammar and the design of new systems of grammar (or “lin- 
guistic structures,” as they are called), which are intended to be better suited 
to mechanical analysis than is conventional grammar. 


1 Weaver, W., Translation. ‘Machine Translation of Languages,’’ W. N. Locke and A. D. 
Booth, eds., Wiley, New York, 1953. 
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There are other differences among the various groups working on machine 
translation. For instance, in designing the dictionary or glossary, some propose 
to list in the glossary every inflectional form of every source word, while others 
propose to list only the stem, or equivalently some canonical form such as the 
infinitive of a verb and the nominative singular of a noun. Russian nouns have 
a dozen inflectional forms, adjectives and verbs many more. Thus the size of 
the required glossary is greatly affected by this decision. 

Other differences are found in limiting the scope of a machine translation 
project. Some groups are satisfied to translate into a kind of pidgin English, 
Some are resigned to leaving certain semantic ambiguities unresolved and print- 
ing out multiple meanings. Some are willing to admit failure in a small per- 
centage of all cases. Some will even admit undetected errors in the translation, 
a point of view which others consider dangerous. Some propose to use a 
man-machine partnership rather than letting the machine do the entire job. In 
these cases the machine prepares certain aids to translation—at best a kind of 
preliminary draft, and these are used by a “posteditor” in producing a polished 
translation. Preeditors are less frequently contemplated, but a certain amount 
of preediting may be combined with manual key punching of the text. 
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SURVEY OF PROJECTS 


The most important ones among the mechanical translation projects in West- 
ern countries are enumerated, in geographical order from west to east, and 
each project characterized briefly. 

1. University of Washington, one of the oldest projects, concerned with trans- 
lation from German and Russian to English. Extensive dictionary of inflected 
forms. Word-for-word translation, study of selected grammatical and semantic 
problems. 

2. University of California, Berkeley, one of the youngest projects, Russian to 
English. Emphasis on economy in the use of the dictionary. 

3. Rand Corp., Russian to English, empirical and statistical approach, some 
work on semantic problems. Large corpus of transcribed Russian material in 
physics and mathematics. 

4. Ramo-Wooldridge, Russian to English, empirical approach, emphasis on 
programing methods and on display of results in a form which facilitates con- 
tinuing revisions. In its approach to grammar this project is close to one of the 
groups formerly at Georgetown University. 

5. University of Texas, a newcomer, German to English; concentration on 
grammatical problems. 

6. Wayne State University, a new group, Russian to English; statistical ap- 
proach. Cooperates with Ramo-Wooldridge and with a group formerly at George- 
town University. 

7. Georgetown University, one of the oldest groups, at one time comprised four 
separate projects. One of these is disbanded, its ideas being continued at Ramo- 
Wooldridge and Wayne. Another, Russian to English on an empirical basis, 
recently moved from Georgetown to a private corporation, CEIR. 

A third project at Georgetown, called “General Analysis Technique,” Russian 
to English, has made considerable progress with syntactic analysis, which is being 
developed in part empirically and in part systematically, staying fairly close to 
conventional grammar. A good-sized dictionary has been assembled and a large 
corpus of text examined. Heavy reliance is placed on postediting. 

The fourth of the Georgetown projects works on French to English, is strictly 
empirical, lays stress on advanced programing techniques. 

There are also small-scale efforts devoted to Chinese and Arabic. 

Most projects at Georgetown are concerned with source texts in the field of 
chemistry. 

8. National Bureau of Standards, a relatively recent project, Russian to Eng- 
lish, systematic approach to grammatical (morphological and syntactic) analysis, 
Staying close to conventional grammar. Emphasis on economical use of diction- 
ary. Uses source texts in mathematics. 

9. University of Pennsylvania, systematic approach to grammatical structure 
of languages, especially English. 

10. IBM Corp., concerned mostly with hardware, but also with supplementary 
systems studies. 

11. Massachusetts Institute of Technology, one of the oldest projects, has 
pioneered in developing new theories of the grammatical structure of languages. 
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Also works on syntactic problems of German-to-English translation and on gen- 
eral-purpose programing systems for machine translation in general. 

12. Harvard ‘University, Russian to English, has compiled a large dictionary 
and worked systematically on methods and machine codes for dictionary compila- 
tion and updating, and on word-for-word translation. More recently concerned 
with grammatical analysis on lines similar to National Bureau of Standards, and 
with general theory of language structures. A related project is in operation 
at the Arthur D. Little Co. 

12. Birbeck College, University of London, apparently the oldest group in the 
field. German, French, and some Russian to English. Morphological analysis 
of source words, some syntax, largely empirical. 

14, Cambridge University, England, general linguistic theory, semantic prob- 
art University of Milan, Italy, a highly theoretical, long-range approach. 

The CHarrMan. Mr. Moeller. 

Mr. Moetier. I may say, Dr. Cannon, you have made an excellent 
presentation. This may sound a bit facetious to you, but I am sure 

ou are thinking of the decade following this one, too, as all of us 
might be. If we can devise a mechanical translator of the reliability 
that we want with these various computing devices, what is going 
to prevent us now in the next decade from being able to, before people 
express their thoughts in a language, get those thoughts without the 
language? Are we ready for that in 1970/ Not exactly mental 
telepathy, but certainly we ought to put Mr. Khrushchev in something 
and find out what he is thinking, so Mr. Eisenhower would know what 
to do about the situation. If we had that today, we might spare our- 
selves embarrassment. Surely that is coming, isn’t it? I understand 
medicine is doing something of this sort. I am sure we will be able 
to do it also, won’t we ¢ 

Dr. Cannon. Well, a few years back one heard of the Rine ex- 
periments in which perhaps an approach to this sort of thing was be- 
ing considered. I haven’t read of them recently. I probably should 
stand on my conservatism and say I just don’t know. It would cer- 
tainly be a desirable achievement, patricularly if we could keep the 
information to ourselves. 

Mr. Moriier. Well, I am just throwing this in. Ten years ago I 
am sure there weren’t many people thinking about the ability of these 
devices. 

Dr. Cannon. That is true, sir. I think that little more than 10 
years ago few people really felt that the electronic digital computer 
was more than a dream, and that a person was impractical who did not 
feel that such devices would never be realized. 

Mr. Moetier. The subconscious mind stores a lot of things to which 
expression is given at some future time. Somebody may be able to 
explore this subconscious mind and get the information; as I said 
before, it is stated in a language. 

Mr. Fuuron. Would you yield? 

Mr. Moetier. Yes, I would. 

Mr. Fuuron. In the February 1910 issue of “Popular Mechanics,” 
there is not only an article on this, but there is a diagram of how to 
do it and how to telephone it on a dial. 

_ Mr. McDonoveu. Didn’t Jules Verne write what we are now find- 
ing through the bathyscaph and what we are finding through the satel- 
lites? He had thoughts of actions of future generations and put it 
in words. It is nothing new. It has been going on, it is visionary 
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in literature, and literature has been expounding what the next gen- 
eration is going to be thinking and doing. They don’t believe them, 
but time comes around and experience proves that some of these things 
are right. 

If we had such equipment as you say, that Mr. Khrushchev might 
be analyzed by it; the difficulty is to get him into the equipment for 
analysis. 

Mr. Fuuron. I know people sitting here 50 years from now will 
remember Mr. Moeller’s remark. 

Mr. Moetier. Mr. Chairman, I merely say this because we have got 
to stay several jumps ahead of them, and I think we are getting there, 
and I would just like to be 10 years ahead of them on this other idea, 

Mr. McDonoveu. Mr. Chairman. 

The Cuarrman. Mr. McDonough. 

Mr. McDonovcu. I would like to ask Dr. Cannon: You cite several 
instances where the words can be interpreted in various ways. <Aren’t 
there already Russian translators who could tell us what these words 
mean, and avoid the possibility of going off in the wrong direction? 

Dr. Cannon. Yes, sir; there are, but there are too few of them to 
meet the demand for the translation. 

Mr. McDonoven. I don’t mean American-educated Russian trans- 
lators, I mean Russians in this country who are citizens of the United 
States that know the Russian language. 

Dr. Cannon. Well, it is true that we have people who are bilingual 
and with such qualifications. One of the problems we feel to be 
important is to avoid—in order not to rob Peter to pay Paul—with 
respect to people with these qualifications, but who are probably 
doing essential work in other areas, to avoid pulling them off of those 
tasks and making them translators. That is one of the difficulties. 

Mr. Bass. Dr. Cannon, do you know how many translators we have 
now, translating scientific—Russian scientific—information into Eng- 
lish? 

Dr. Cannon. Not offhand. 

Mr. Bass. Would you say it is a matter of hundreds or dozens, or 
less than that? 

Mr. McDonovuen. Do you mean in the Government service ? 

Dr. Cannon. I think hundreds. 

Mr. Futon. There is a total exchange between the United States 
and Russia of 85,000 to 90,000 pamphlets and monograph books each 
year. That is a total of about 180,000 to 190,000 that have come 
through. 

The CuarMan. Then, in addition to that, the departments make 
a lot of purchases direct from magazines and articles and have them 
photostated. 

Mr. Bass. Were you able to answer my question Y 

Dr. Cannon. I am sorry, sir, you didn’t get an answer from me. I 
feel certain it is more than dozens, and more on the order of hundreds, 
but may I confirm that ? 

Mr. Bass. Yes. 

(The information requested is as follows :) 

The attempt to ascertain the number of translators in the United States 


qualified to handle scientific Russian was unsuccessful. Certainly some Govern- 
ment agencies employ such translators, and people with the required qualifica- 
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tions can be found in industry and in our educational system. However, no 
central roster of such personnel was uncovered, or could be assembled. 

One of the difficulties in estimating our present capability in the translation 
of technical Russian is caused by translators who work on a spare-time or part- 
time basis. Such translators do not need to advertise widely to obtain requests 
for their services. While they make a major contribution to the interchange of 
scientific information it is apparently impossible to determine their number. 

The CuarrmMan. You said something here in your statement about 
the ability of the machine based on hindsight and foresight. Can 
you tell us a little bit more about how a machine can have a hindsight 
and how it can have a foresight ¢ 

Dr. Cannon. Yes, sir. Of course, our terms are really terms coined 
by Mrs. Rhodes. Now we normally think faster than we speak. It 
is quite likely that usually the complete sentence is in the mind of the 
speaker before he begins uttering it. fo: 

People using the same native tongue, citizens of the same country, 
are accustomed to the same locutions, so that the hearer is able to an- 
ticipate what will follow from what he has already heard the speaker 
utter, usually. This anticipation, we call it predictions, are incorpo- 
rated into the machine program, and, of course, the term “subjective” 
has been used. I think it is clear that for this approach we would 
have to have a program written by someone who has a very intimate 
knowledge of the source language, the Russian language in this case. 
Now, the machine is instructed in our program as it examines the 
source, or the Russian words, to look into the location, we call it the 
pool, where these anticipations or predictions have been stored. We 
call this the foresight pool. In case a source word satisfies one of the 
predictions, then the machine cancels that prediction, removes it from 
storage, and proceeds to the next source word in its analysis. 

In case there is not this fulfillment of expectation, the machine indi- 
cates this failure in what we call the hindsight pool, hoping for ree- 
onciliation later, that is, hoping that later words or later occurrences 
will fulfill the prediction. 

Therefore, as the machine goes along scanning the text, it must look 
up the foresight pool, it must add to the foresight pool on the basis of 
the grammar, including these affinities of words, the predictions that 
words themselves generate, apart from the general grammar which 
we mentioned before. It must keep the foresight pool current. It 
must look to the foresight pool for possibility of canceling predictions 
that have been fulfilled, and look at the hindsight pool. 

The Cuatrman. In other words, it picks out key expressions and 
incorporates them in the foresight pool ? 

Dr. Cannon. Yes. 

The Cuairman. If they don’t turn out on the basis of the key word 
at the beginning of the expression, that is, if they do not turn out 
properly, then they are dumped into the hindsight pool ? 

r. Cannon. Yes, sir. Expressions which, according to the pro- 
gram, the machine expects on the basis of what has been used in the 
same language before, what has already been uttered, or, more pre- 
cisely, what has already been scanned by the machine. 

The Cuarrman. Colonel Dillon, do you have any questions? 

Colonel Ditton. No, sir. 

The Cuairman. I think that covers it pretty well, unless there are 
further questions, Doctor. 
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I hope Mrs. Rhodes, who has in her brain the store of knowledge: 
regarding this matter which is so technical and rather mystifying, 
will recover shortly from her illness and the program will proceed as 
we anticipate. We want to thank you for coming here on her behalf 
and making your presentation. 

Dr. Cannon. Thank you, Mr. Chairman. 

The CuarrmMan. There is one more thing I want to say to the com- 
mittee, since we are all here. 

We have worked out this formula, and I just want to tell the mem- 
bers of the committee here, in reference to the contracts that have been 
giving us considerable trouble, I think we can say for the committee 
that in the future if a contract comes up and there is some question 
about it, the letting of it by NASA, that we are first going to check 
this with the agency of government that has been provided by Con- 
gress, the General Accounting Office, asking the General Accounting 
Office to investigate the matter and report. 

The General Accounting Office has technicians that we don’t have, 
and oftentimes they need these technicians for the purpose of question- 
ing these contracts and verifying them. Then only after we hear 
from GAO is it intended that the committee will look into a contract 
award on itsown. I think that is the best we have been able to work 
out. If anybody has any further suggestions on it, we would like 
to know. 

Now, with reference to the contract recently mentioned to me by so: 
many Members of Congress, we invited the General Accounting Office 
to come in and make a full investigation, which they are doing now. 
They put their people on it, they have a team up here, and when their 
report comes through we will take the matter up with the committee. 
In that way we don’t assume the responsibility of any award, and we: 
use the agencies which Congress has provided us with in the past for 
the purpose of making the inquiry. 

If the GAO finds something palpably wrong they will certify that 
to the committee, and then the committee will take it up. 

I would be glad to have your views on that matter in the future, 
but for the time being that is what we thought out. 

Mr. Fulton. 

Mr. Furron. I would recommend there be nothing in the report 
except what applies to the particular bill. So that eliminates in the 
report any reference to the procedure you are speaking of. 

The CHamrman. Yes, I have no reference to the report at this time, 
but I mean generally in the future for the committee, these contracts: 
are apt to come up from time to time, and I have been questioned as 
to the rules of the committee in reference to handling of the con- 
tracts. It seems to me it is fundamental when we have an agency 
of that sort whose duty is to check into it, before the committee goes 
into the propriety of the award of any contract, that we should check 
it with the GAO and gets its certification. We are not bound by the 
GAO recommendation, we are not bound to wait on the GAO if it 
drags out too long. but at least we would then be able to find the 
agency of Government which has been charged with that responsibil- 
ity and get their ideas first. : ; 

Mr, Bass, Mr, Chairman, does this mean we would automatically 
refer all contracts to GAO? 
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The Cuarrman. Oh,no. The GAO, though, on its own, can initiate 
an investigation, but only where a contract has been brought to the 
attention of the committee as being something that should be looked 
into, will we then communicate with GAO and ask them to advise us 
what their investigation has been, and following that, advise us what 
recommendations they may make. I do not anticipate it will come 
up often, but we do have those worrisome contracts. That is the com- 
mittee privilege to see what data we have here that is classified on re- 
cent contracts, but I can tell you before you look into it that it amazed 
me that any member of this committee would display the technical 
know-how and requirements needed to analyze and evaluate the 
phases of the contract which are supposed to have value and impor- 
tance in the consideration. 

Is there any further business? 

If not, the subcommittee will adjourn until tomorrow at 10 o’clock. 

(Whereupon, at 11:55 a.m., the subcommittee adjourned, to recon- 
vene at 10 a.m., Friday, May 13, 1960.) 
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FRIDAY, MAY 13, 1960 


House or REPRESENTATIVES, 

COMMITTEE ON SCIENCE AND ASTRONAUTICS, 

SreciaL INVEsTIGATING SUBCOMMITTEE, 
Washington, D.C. 

The subcommittee met at 10 a.m., Hon. Overton Brooks, chairman, 
presiding. 

The Cuarrman. The subcommittee will come to order. 

This morning we have four witnesses for the subeommittee hearing. 
We have Mr. Paul Borel, Assistant Director of the Ceneral Intelli- 
gence Agency, and Prof. Leon E. Dostert, director, Institute of Lan- 
guage and Linguistics of Georgetown University, and then we have 

r. Walter G. Driscoll, vice president for research of the Baird- 
Atomic, Inc., and Dr. Walter S. Baird, chairman of the board of 
Baird-Atomic, Inc. Because of that fact, unless there is some objec- 
tion, I would suggest to the committee that we take the first two 
witnesses who have come here in a similar capacity and give them 
the first hour, and then the second hour will be for the witnesses from 
Baird-Atomic, Inc. That is what we will do. 

Our first witness is Mr. Paul Arnold Borel, who was born in Switzer- 
land in 1912, and attended public schools in Kansas City, Mo. He 
has a very distinguished record. I am not going to attempt to place 
his entire biography in the record at this point, but the members of 
the committee all have a copy of your biography, and we recognize 
you as an authority on the subject that you present yourself to this 
morning at the hearings. 

We would be very happy, Mr. Borel, if you would proceed with 
your prepared statement. 

Mr. Borer. Thank you, Mr, Chairman. 


STATEMENT OF MR. PAUL A. BOREL,’ ASSISTANT DIRECTOR FOR 
CENTRAL REFERENCE, CENTRAL INTELLIGENCE AGENCY, AND 
CHAIRMAN OF DOCUMENTATION COMMITTEE OF THE U.S. INTEL- 

LIGENCE BOARD 


Mr. Chairman and members of the committee, the Central Intelli- 
gence Agency is pleased to respond to your invitation to outline its 
views on machine translation. 


™Paul Arnold Borel: Born March 15, 1912, in Zurich, Switzerland. Married Miriam 
Eleanor Chesham, and has six children. 

Attended public schools in Kansas City, Mo. B.S. in C. E., School of Engineering and 
Architecture, University of Kansas (1934) ; M.B.A., Graduate School of Business Admin- 
istration, Harvard University (1938); M.A. (International Administration) Columbia 
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Our interest in MT dates from 1951, when some of our scientists 
discussed the possibility of developing an automatic indexing and 
translating machine with Dr. James Perry, then with the MIT Center 
for International Studies and now Director of the Center for Docu- 
mentation and Communications Research, Western Reserve Uni- 
versity. After some preliminary work, Dr. Perry and CIA repre- 
sentatives, in June 1952, attended a meeting at MIT of linguists, 
logicians and mathematicians on the subject of machine translation, 
The principal result of that meeting, which was promoted by Dr, 
Y. Bar-Hillel and supported by the Rockefeller Foundation, was the 
further stimulation of interest and the realization of possibility in 
the minds of some of the linguists present. 

In the next 2 years or so, CLA reviewed various proposals, includ- 
ing proposals from MIT, the Battelle Memorial Institute, and George- 
town University. Some of these were considered jointly with elements 
of the Department of Defense. 

Our position during that period was that the development of a 
machine translation capability was highly desirable, and hence that 
we should support an MT program. We recognized, however, that 
such a program had implications which transcended the interests of 
CIA and those of the intelligence community. We therefore con- 
sidered it preferable that an organization with broader responsibili- 
ties than our own be prevailed upon to take the initiative to push a 
comprehensive MT program. We identified our immediate need as 
a usable product, i.e., one which might well be far short of a perfect 
translation but nevertheless highly useful. In return for an early 
MT capability to produce a usable product, we were willing to leave 
the achievement of superior results to a longer range program. 

This pragmatic approach was our aim and purpose in 1954. It 
remains our aim and purpose today. 

In early 1955, CIA approached the National Science Foundation, 
and concurrently ascertained the degree of interest in the Department 
of Defense. These overtures were directly related to one of a suc- 
cession of proposals by Prof. Leon E. Dostert, of Georgetown Uni- 
versity. Defense representatives were “all in substantial agreement 
that, while the Department of Defense does not find it possible to 
authorize any funds for this project, we will be very much interested 
in any such device once its feasibility has been firmly established.” * 

Negotiations with the National Science Foundation culminated, in 
early 1956, in an exchange of correspondence between Dr. Alan T. 
Waterman, Director, National Science Foundation, and Mr. Allen W. 


University (1944); LL.B., the Law School, the George Washington University (1951); 
member, Bar of the Supreme Court of the United States; graduate, the National War 
College (1950). 

Prior to World War II, worked as an engineer for Sun Oil Co., Black & Veatch Con- 
sulting Engineers, and Phillips Petroleum Co. 

Active naval service from December 1940 to December 1946, initially as naval cost 
inspector, then as military government and civil affairs specialist, including duty at the 
Potsdam Conference and the Paris Peace Conference. Presently captain, USNR, in Naval 
Reserve Politico Military Affairs Co. 

Joined CIA in March 1947, and has served as Secretary of the Intelligence Advisory 
Committee, and Deputy Assistant Director for National Estimates. Present position: 
Assistant Director for Central Reference, and Chairman of the Committee on Documenta- 
tion of the U.S. Intelligence Board. 

8 Assistant Secretary of Defense (R. & D.) letter to Director of Central Intelligence, 
dated Apr. 5, 1955. 
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Dulles, Director of Central Intelligence. The National Science Foun- 
dation agreed— 


to administer any part of a program of research in machine translation which 
is agreed by all concerned to be desirable.’ 


CIA recognized the need for careful planning and coordination— 


to insure maximum progress toward our immediate goal of a machine capability 
to translate the Russian technical literature.” 


Two short excerpts from Mr. Dulles’ letter of February 29, 1956, 
are useful to point out some of the broad considerations : 


I should like to reaffirm the deep interest which we in the intelligence field 
have in the possibility of translation of Russian language materials, particularly 
in scientific fields, into English by machine. In addition, many of us feel that 
the degree of human understanding that could be accomplished if language 
parriers could be lowered without sacrificing linguistic integrity might well be a 
major step toward peace. 

* * * * * * * 


It is our opinion that much is to be gained by the early development of a 
machine capability for translation. The national security can be well served 
if we have available the scientific and technical literature of the U.S.S.R. in 
English for detailed analysis as early after publication as possible. I am assured 
by leaders in electronic research that technological problems yet unsolved need 
not stand in the way of the rapid development of a machine once the linguistic 
research had been started. 


In the period between May 1956 and the present, the U.S. Govern- 
ment has provided financial and logistical support for the Georgetown 
project totaling some $730,000, as follows: 


Department of Defense (estimated value of computer time provided 

without reimbursement by Air Force and Army) _------------_-_- 120, 000 
CIA grants (direct or reimbursement to NSF). -.-.___---_----_____ 493, 000 


Considering the inherent difficulties of the undertaking, we have 
been very pleased with the substantial progress made by Georgetown 
University. Efforts to date have been experimental, with Russian 
organic chemistry texts processed totaling about 500,000 words, and 
those in French nuclear physics some 200,000. It would be extremely 
valuable to apply the lessons learned to the automatic translation of 
several million words of texts covering various disciplines of particu- 
lar interest, such as, organic chemistry, geophysics (astronomy, 
meteorology and celestial mechanics), physical chemistry, high energy 
physics and solid state physics. And in this connection we are cur- 
rently studying a proposal by Georgetown University to conduct a 
large-scale operational feasibility test during the next fiscal year. 

As a further indication of the development in the state of the art, 
I should also mention that within the last 10 days we received a pro- 
posal by the International Business Machines Corp. for the establish- 
ment of an automatic translation facility. The committee may be 
interested in the exchange of correspondence between Dr. E. R. Piore, 
director of research, IBM," and Mr. Dulles.* I have attached copies 
of that correspondence to your text. 


jh presetee, cre Science Foundation letter to Director of Central Intelligence, dated 
Mar. 23, 1956. 

1 Director of Central Intelligence letter to Director, National Science Foundation, dated 
Apr. 9, 1956. 

"IBM letter to DCI, dated Apr. 20, 1960. 

“DCI letter to IBM, dated Apr. 30, 1960. 
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This proposal, too, is under study. 

It is fair to ask whether developments since our initial interest in 
MT have called for a change in original objective. In the main, the 
answer is “No.” In recent years the volume of available Russian scei- 
entific and technological literature has greatly increased. We estimate 
that the annual available output is now about 780 million words. This 
increase has been accompanied by increased efforts by the Government 
to translate the most. useful part of this production. And the per- 
formance, 95 percent of which is by the Government or under Govern- 
ment contract, is impressive. About 53 million words of Russian 
scientific literature are now being translated annually (of which CIA 
accounts for over 9 million). 

Why then the need for MT? The reasons can, I think, be simply 
stated. There are seven, as I view it. 

1. The volume of publication will continue to increase, and at a rate 
in excess of our ability to procure competent translations. 

2. The quality of translation work done through contract arrange- 
ments is not uniformly excellent. Whatever the level of accomplish- 
ment in MT at any given time, the output is uniform. In short, MT 
holds out the promise of a uniformly more accurate product. 

3. MT also promises greater speed. We now give priority to cate- 
gories and languages of greatest interest. Nonpriority items are in- 
variably slow in reaching the reader. Perhaps the translator with the 
particular skill in a language, or in a discipline, cannot immediately 
take on the task. In any case he cannot translate on an average more 
than 2,600 words/day. The machine can hurtle them out at rates of 
3,000-50,000 words/hour depending on the computer used. And these 
rates will increase. Even if postediting were required, the man- 
machine system would appreciably outproduce the human translator 
working alone. 

4. With MT, more translations would be available. This increased 
availability of translations would itself generate new and more wide- 
spread demands for them. We now strive to pass over only marginal 
material, but cannot be sure that we are invariably successful. 

5. Greater availability would result in a better informed corps of 
scientists in this country. This would result in superior evaluations 
of scientific and economic developments in the bloc than is now 
possible. 

6. The development of a two-way MT capability would make pos- 
sible low cost production of American publications for sale in under- 
developed countries where low cost bloc publications now have an 
almost clear field to the detriment of U.S. interests. 

7. And finally, the research done and the techniques developed for 
accomplishing translation by machine would contribute materially 
to the solution of problems in the broader field of information storage 
and retrieval, and the emerging field of language data processing. 

A word should be said about some of the problem areas. 1 am not 
too concerned about the technical problems. There are many, of 
course. For example, an all-purpose rapid print reader would cer- 
tainly be essential to an efficient system. ‘Today we prepare com- 
puter input by punching each Russian word onto IBM cards and then 
go from card to magnetic tape. 

Our best card punch operators can prepare only about 8,000 
words per day in this manner. But good work is being done in the 
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technical area and solutions will be forthcoming. More basic is the 
problem of organization. Shall an MT capability once achieved be 
exploited by each on his own or should a central facility serve all? 
If the latter, who shall set it up, who shall operate it, and under 
what terms shall Government and private interests participate ? 

It is not too early to start thinking about this. I believe a central 
facility is indicated, but not exclusively so. The enormous potential 
output of MT greatly exceeds the present and prospective require- 
ments of any one part of Government or single private organization. 
Problems of procuring and selecting materials to be translated, and 
of disseminating translations to those needing them, are very con- 
siderable. These can most efficiently and economically be solved 
centrally. Moreover, a central facility permits the use of equipment 
exclusively designed to produce automatic translations. There are, 
however, requirements for accomplishing translations under mobile 
conditions, or, for fully utilizing general-purpose equipment acquired 
for processing data rather than language. Hence there is also con- 
tinuing need for research to develop MT materials and programs in 
various languages and disciplines for translation by general-purpose 

When MT is discussed there is invariably an expressed interest in 
what the Soviet Union is doing in this field. I will not dwell on this 
except to say that the Soviets have a program which considerably 
exceeds our own in scope and size, and that they are doing very good 
theoretical work, though restrictions on the availability of computer 
time has limited opportunities to apply theory to practice. Two 
papers, one by Professor Oettinger (Anthony G. Oettinger, “A Survey 
of Soviet Work on Automatic Translation,” “Mechancial Transla- 
tion,” vol. 5, No. 3, December 1958, pp. 101-110), and one by Dr. 
Harper (K. E. Harper, “Soviet Research in Machine Translation,” 
Rand Corp. Monograph No. P-1896, Feb. 4, 1960, 17 pp.), provide 
valuable assessments of the Soviet effort. The Joint Publication 
Research Service series “Soviet Developments in Information Proc- 
essing and Machine Translation,” will also be of interest to the 
committee. 

I have copies here of the two papers I referred to, and also sample 
copies of the JPRS series which I will leave with the committee. 

he Crarrman. Are they lengthy papers, the two you refer to by 
Dr. Oettinger and Dr. Harper? 

Mr. Boren. No, sir 

The Cratrman. If they are not very lengthy, we might set them 
forth in the record. What would you think of that? 

Mr. Borer. Dr. Oettinger’s article is from the publication “Me- 
chanical Translation.” 

The Cuarmman. It is too long to make it a part of the official 
transcript. 

Mr. Boret And the other is a 17-page study, sir. 

The Cuarrman. But the committee would like very much to have 
them available. 

Mr. Bass. I wonder, Mr. Chairman, whether the article requested 
was a short article in that large pamphlet you have shown? 

Mr. Bore. It is an article of about 10 pages out of this publication. 
So you have about 27 pages altogether. 
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The Crarrman. You better let it remain available for members of 
the committee and the staff, if you will. 

Mr. Borst. You may keep them, sir. 

The Cuatrman. Thank you, sir. 

Mr. Borst. I will close my remarks with a word about coordination 
of MT activities among Government departments. Until recently 
only a few departments or agencies had programs. As the field grew 
CIA took steps to formalize within the intelligence community the 
informal channels of communications used by those in charge of MT 
programs, whether in the intelligence or the research components of 
their respective departments. A group of experts was formed whose 
membership is drawn from the Army, Navy, Air Force, Department 
of State, NSA, and CIA, with a National Science Foundation repre- 
sentative as associate member. 

The functions of this interdepartmental group are (1) to advise the 
Committee on Documentation of the U.S. Intelligence Board with 
respect to all machine translation activities; (2) to coordinate MT 
activities within the intelligence community; and (3) to inform its 
members of new projects and of the status of existing projects. For 
overall coordination in matters transcending the interests of the intel- 
ligence community, CLA looks to the National Science Foundation. 

Thank you. 

The Cuatrman. Thank you very much, Mr. Borel, for a very con- 
cise, well thought out, well-prepared statement. 

Next, we have the statement of Professor Dostert, the director of 
machine translation research at Georgetown University. 

Professor Dostert. 


STATEMENT OF PROF. L. E. DOSTERT,* DIRECTOR, MACHINE TRANS- 
LATION RESEARCH AND LANGUAGE PROJECTS, GEORGETOWN 
UNIVERSITY 


Mr. Dosterr. Mr. Chairman, members of the committee, George- 
town University is happy to respond to your invitation to report on 
its work in the field of machine translation over the last 8 years. 


137,, E. Dostert: 1955 to present: Director of machine translation research project of 
Georgetown University, originally under sponsorship of National Science Foundation and 
subsequently under the Central Intelligence Agency. 

1949-59: Founder and director of the Institute of Languages and Linguistics, George- 
town University, professor of French civilization. Chairman, Department of Foreign 
Languages, School of Foreign Service, Georgetown University. 

1953-54 : In charge of research and organization for mechanical translation experiment, 
Georgetown University-IBM Corp. 

1948-49: Secretary General, International High Frequency Broadcasting Conference, 
Mexico City, under United Nations auspices. 

1947-48 : Administrative counselor, International Telecommunications Union. 
™ 7 lg In charge of organizing simultaneous interpretation system at the United 

vations. 

1942-46: U.S. Army (major to colonel). (1945-46: In charge of organizing simulta- 
neous interpretation system, Nuremberg war crime trials. 1942-45: Staff officer and 
interpreter to General Eisenhower. 1942—44 ; Liaison officer to French commander in chief. 

1941-42: Professor of French civilization, Scripps College, Claremont, Calif. 

1939-41: Attaché, French Embassy, Washington. 

1926-39: From instructor to professor of languages, Georgetown University College of 
Arts and Sciences and School of Foreign Service, chairman of department, 1936-39. 

Degrees: B.S.F.S., Georgetown University, 1928; Ph. B., Georgetown University, 1930; 
M.A., Georgetown University, 1931; postgraduate studies in languages, Johns Hopkins 
University, 1931-35; Litt. D., Franklin and Marshall College, 1957; LL. D., Georgetown 
University, 1958 ; Litt. D., Occidental College, 1960. 

Awards: United States: Legion of Merit with Oak Leaf Cluster; Bronze Star with Oak 
Leaf Cluster; European Theater Ribbon with four combat area stars. France: Knight of 
the Legion of Honor; Croix de Guerre with two Palms; Officier d’Academi. Morocco: 
Commander Ouissam Alaouite. Tunisia: Commander Nisham Iffikar. 
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Georgetown University became actively interested in machine trans- 
lation as a direct result of our participation in the meeting called at 
the Massachusetts Institute of Technology under the sponsorship of 
the Rockefeller Foundation in June of 1952. 

Though skeptical of the prospects of machine translation when in- 
vited to the conference, I came away convinced that there was enough 

romise to warrant an empirical approach to the problem. I stress 
the word “empirical,” Mr. Chairman, because in the June 1952 meet- 
ing the vastness and complexity of a general theoretical solution be- 
came quite apparent in the course of the discussions, and I felt that 
a pragmatic cumulative approach might prove more fruitful. 

One basic assumption which guided our thinking from the outset 
was that the research and experiments should be based on actual texts 
which would be used as the lexical and grammatical basis for MT 
operations. 

A second assumption was the belief that the texts utilized for lexical 
and structural nisin should be available in the English language, 
as translations from actual Russian texts. 

The third basic determinant was that our research should be focused 
on scientific and technological literature, and that, as reasonable lin- 
guistic and programing results were achieved, these should become 
the basis of actual machine tests on general computers. 

Finally, the output of such cumulative tests were to be used as the 
basis for the continuing improvement of the linguistic formulation 
and the programing techniques, as well as for the updating of the ma- 
chine dictionary. 

After the MIT meeting in 1952, I submitted to various Government 
agencies plans for the initiation of research along the lines just re- 
viewed. Since I did not propose a “machine” my efforts were in vain. 
The reaction in Government circles was generally skeptical, and many 
argued that the development of electronic computers at the time 
was not sufficiently advanced to warrant the expectation of positive 
results. 

It is in these circumstances that Georgetown University arranged 
to carry out a very limited machine test for the translation of about 
120 Russian sentences based on a lexicon of 250 entries and a small 
set of manipulation rules. 

Professor Garvin of Georgetown University, and Mr. Sheridan of 
IBM made very significant contributions to this first experiment. 

In January 1954, for the first time, so far as it is known, one lan- 
guage was actually translated into another (Russian into English), 
on a limited scope, on an electronic computer, the IBM 701. Because 
of the novelty of the experiment and the fact that it was decided by 
IBM that it should be made public, it received widespread notice in 
the press and other media of communication. 

Let me imsert orally here, Mr. Chairman, that this was frowned 
upon by some. However, it is part of our culture to seek the sensa- 
tional, and others have been exposed to the same difficulties that we 
encountered. The last case in point in reference to machine trans- 
lation appeared on page 1 of the second section of last Wednesday’s 
New York Times. 
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Encouraged by these first results, which we had characterized as 
merely the Kitty Hawk of machine translation, we submitted new 
proposals to various Government agencies, again without aflirmative 
response. We met with the argument that the test was trivial in 
scope and that indeed some of the operations were open to question in 
terms of their scientific validity. Machine translation research in 
America continued practically at a standstill during 1955. Early in 
1956, in the journal, Problems of Liguistics, Soviet experts reviewed 
the Georgetown-IBM experiment—and in contrast with some of the 
reviews which appeared in this country—it was remarkable and com- 
mendable for its objectivity. They announced at that time that they, 
on the basis of the information gathered from the Georgetown experi- 
ment, had started research in the field of machine translation, which 
they claimed had brought them beyond the level of achievement 
demonstrated in the Georgetown-I BM test. 

After this Soviet article became known to the American defense 
and scientific community, there was a marked renewal of interest and 
a significant strengthening of support for research in the field of 
machine translation. 

Since the fall of 1956, Georgetown has conducted research and 
experiments on an increasingly broad scale, with primary focus on 
preparation for translation, on preparing for the translation, I should 
say—of Russian into English in the field of organic chemistry. 

The initial collation of lexical materials and the analysis of strue- 
tural and semantic data was based on a Russian text from the Soviet 
Journal of General Chemistry, totaling approximately 30,000 running 
words, for which a translation was already available in English. From 
the outset the Georgetown project encouraged diversity of approach to 
the problem of machine translation. The assumption that diversi- 
fied concepts and techniques would converge because of the similarity 
of the objectives was unfortunately not confirmed by actual expe- 
rience, and a great deal of unproductive controversy was generated 
among two or three of our groups. It is gratifying to report that in 
1959-60 our project has a more unified orientation. It should also 
perhaps be mentioned that some of the former members of the George- 
town staff associated with the project. have been able to devote their 
efforts to the work done or planned in other centers. Thus the spirited 
emulation of the first 2 years among our several groups was not 
totally unproductive. 

At the present time the Georgetown project has reached the fol- 
lowing stage: 

1. A total of 395,000 words in continuous texts in the field of organic 
chemistry in Russian have been keypunched, along with a corpus of 
20,000 words in the field of metallurgy, for a grand total of 415.000 
keypunched running words. Of these, a corpus or segment of 268,000 
running words has been lexically abstracted, yielding a dictionary, 
now coded, of 10,800 entries. 

In addition, a total of 106,000 runnings words have been the object 
of structural analysis. Finally, a total of 115,000 words from these 
corpora have been translated from Russian into English with vary- 
ing degrees of acceptability and smoothness. Of these 115,000 trans- 
lated words, about 1,100 were translated as a random text in June of 
1959. The same 1,100 words were run again in January of 1960 to 
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KATAAHTHYECKOE AAKHAHPOBAHHE ®MEHOAA 
METUAOBLIM CMUPTOM 


Hi. Camconosa 


G@eHOAA BTOPUYHbIMH H TPETH4HbIMH 
HANNA Ne} BHYHDIMK CNKETAMH NPHBEAM K OTF HUATEADHDIM Pe3yAbTa- 
SAKHANPOBAHHH EHOAA METHAOBbIM CNMPTOM HavAy4WHe pe3syAbTATH 
NOAYYCHDE Ha aKTHBHPOBAHHOH OKHcH 

Hamu H3yYeH MpOwecc aAKHAMPOBAHHA GeHOAA METHAOBHIM 
TOM AKTHBHPOBAHHbIM [IpHPOAHbIM AAIOMOCHAHKATHDIM aTOpOM - 
ryMOpHH MpH aTMOC@epHOM AaBAeHHH B OObIMHOH ycTaHOBKe, 


cucTteme. B peayaptTaTe ObIAM NOAYYCHDI *AHAKHE 
MpOAVKTbI. Kuaxnii KOHACHCAT, NOMHMO He Bowe 
BK peakKUHK) HCXOAHDIX BeIIIECTB, BOAY, 


BLIACHCHHA 3ARNCHMOCTH RbIXOAA UPOAYKTOB OT 
Tpex Temnepatypax (350, 420 SOU), @eHOAa 
MeTHAOBOrO 1:6 CKOpocTH MCXOAHOH CMeCH, panHoi 
wa naa 100 Ma xataansatopa. C yneanuennem party poi 
peaxyuu ot 350 ao 500° aAKHANpOBAHHDIX MeHOAOR 
B 3 pasa AOcTuraa 68", (B pacueTe Ha eHOAW KOHAeHCAaTa). OntHMaan- 
BHIXOA HEHTPaADHDIX MpPOAYKTOB KOHAeCHCAUHH GHIA MOAVSeH npH 420° 
H COCTABARA 23°, (B pacueTe Ha Becb KOHAeHCAaT). 


GEORGETOWN UNIVERSITY MACHINE TRANSLATION RESEARCH 


A, MACHINE TRANSLATION OF RANDOM TEXT (8 JUNE 1959) 


CATALYTIC ALKILIROVAWIE PHENOL 

BY METHYL ALCOHOL 

I. A SAMSONOV 

A SERIES OF WORKS CATALYTIC ALKILIROVANIH PHENOL SECONDARY AND TERTIARY 
ALCOHOLS WAS DEDICATED //1/-/8/. ATTEMPTS CATALYTIC ALKILIROVANI4 PHENOL BY 
INITIAL ALCOHOLS LED TO NEGATIVE RESULTS //2/, /3/, /9/ OR WERE ACCCMPANIED BY 
INSIGNIFICANT YIELDS //1/0/+/1/3/. UPON ALKILIROVANII PHENOL BY METHYL 
ALCOHOL NILUCWIE RESULTS WERE OBTAINED ON THE ACTIVATED OXIDES GF ALUMINIUM 

NATURAL BY US WAS STUDIED THE PROCESS ALKILJROVANI4 PHENOL BY “ETHYL 
ALCOHOL. OVER ACTIVATED ALHMOSILIKATNYM KATALIZATOROM-GUMBRIN AT 
ATMOSPHERIC PRESSURE IN THE USUAL APPARATUS, APPLIED UPON AN INVESTIGATION 
PAROFAZNYX CATALYTIC REACTIONS IN PROTOCNO] A SYSTEM. LIQUID WERE OBTAINED AS 
A RESULT GF A REACTION AND GASEOUS PRODUCTS. PHENOLS LIQUID DVUSLOLNY1 
KONDENSAT, BESIDES NOT WHICH DIO NOT ENTER ZITO THE REACTION OF THE INITIAL 
SUBSTANCES, CONTAINED WATER, ALXILIR0VAMNYE AND NEUTRAL THE REACTION PRODUCTS. 

FOR VY4SNENI4 THE RELATIONS CF A YIELD ALKILIROVANNYX PRODUCTS FROM A 
TEMPERATURE A ACCUMULATION KONDENSATA FOR ANALYSIS WERE CARRIED OUT BY THREE 
TEMPERATURES /350, 420 and 500@/, UPON THE MOLAR RATIO OF THE PHENOL AND 
METHYL ALCOHOL 1.. 6 FOR OF THE SUPPLY OF THE INITIAL MIXTURE, EQUAL 8-10 ML. 
INTO A HOUR OVER 100 ML. OF A CATALYST. FROM 330 TO 500@ A YIELD 
ALKILIROVANNYX PHENOLS INCREASED WITH AN INCREASE OF THE REACTION TEMPERATURE 


ALMOST AND AMOUNTED TO IN 3 OF A TIME 66% /IN A ESTIMATE ON PHENOLS KONDENSATA/. 


THE OPTIMUM YIELD OF THE NEUTRAL PRODUCTS OF CONDENSATION WAS OBTAINED AT 4208 
AND CONSTITUTED 23% /IN A ESTIMATE ON ALL KONDENSAT/. 


& "Random" designates a text taken in a discipline for which e dictionary has 


been developed on the basis of a running body of text of which the random 
text is not a - Also, "random" designates a text eon cement of which 

not _ zed for the purpose of determining specific [a= 
tion rules required for translating this particuler text. 


B, MACHINE TRANSLATION OF THE 


AFTER PARTIAL IMPROVEMENT OF THE DICTIO 


25 JANUARY 194 


TH: CATALYTIC ALKYLATION OF PHENOL BY METHYL 


I. N. A SAMSONOV 


TO THE CATALYTIC ALKYLATION OF PHENOL SECO, 


DEDICATED A SERIES OF WORKS /1-8/. THE ATID 


ENILA BY ALCOHOLS LED TO NEGATIVE RESULTS /2 
INSIGNEPICANT YIELDS /10 13/. UPON THF 


ALCOHOL BEST RESULTS WERE TUT ACT 


BY US WAS STUDIED TEE PROCESS OF THE ALKYI 


AYLOHOL CVER ACTIVATED HATUSA!, ALUMOSILICSTE 


AT ATMOSPHERIC PRESSURE IN TH: APPARATL 


U¥ VAPOR PHASE CATALYTIC REACTIONS IN A CIPCt 


REACTION WERE OBTAINED Ail) GASEOUS PRC 


CONDENSATE, BESIDES No? NOT ENTER 


SUBSTANCES, CONTAINED WAJISR, ALKYLATED PHENC! 


PRODUCTS 


FOR VYUSNENTS THE RELATIONS OF THE YIELD ( 
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‘HRER TEMPRRATURES /350, 420 500G/, UPON 


MEVHYL ALCOHOT. 1.. 6 FOR OF THE SUPPLY OF 
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TEMPERATURE FROM 350 TO 500@ THE YIELD OF TH 
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CHINE TRANSLATION RESEARCH 


RANDOM Tex? (8 JUNE 


ILIROVANIE PHENOL SECONDARY AND TERTIARY 

CATALYTIC ALKILIROVANI4 PHENOL BY 

s //2/, /3/, /9/ OR WERE ACCCMPANIED BY 
PON ALKILIROVANII PHENOL BY METHYL 


Bien ON THE ACTIVATED OXIDES OF ALUMINIUM 


ROCESS ALKILIROVANI4 PHENOL BY “ETHYL 
KATALIZATOROM-GUMBRIN AT 

PARATUS, APPLIED UPON AN INVESTIGATION 
DROCNO! A SYSTEM. LIQUID WERE OBTAINED AS 
PHENOLS LIQUID DVUSLOLNY1 
ENTER INTO THE REACTION OF THE INITIAL 
DVAINYE AND NEUTRAL HE REACTION PRODUCTS. 
YIELD ALKILIROVANNYX PRODUCTS PROM A 
FOR ANALYSIS WERE CARRIED OUT BY THREE 
DN THE MOLAR RATIO OF THE PHENOL AND 

OF THE INITIAL MIXTURE, EQUAL 8-10 ML. 
+ FROM 350 TO 5008 A YIELD 

AN INCREASE OF THE REACTION TEMPERATURE 


'S CF CONDENSATION WAS OBTAINED AT 4208 
ALL KORDENSAT/. 


@ discipline for which # dictionary has 

maing body of text of which the random 
designates a text the of which 
e of determining specific transla- 


this particular text. 


68% /IN A ESTIMATE ON PHENOLS KONDENSATA/. 


B, MACHINE TRANSLATION OF THE 8 JUNE 1959 TEXT 


AFTER PARTIAL IMPROVEMENT OF THE DICTIONAPY AND TRANSLATION PROGRAM 


(25 JANUARY 1960} 


THE CATALYTIC ALKYLATION OF PHENOL BY METHYL ALCOHOL 

I. N. A SAMSONOV 

TO THE CATALYTIC ALKYLATION OF PHENOL SECORDARY AND TERTIARY ALCOHOLS WAS 
DEDICATED A SERIES OF WORKS /1-8/. ‘THE ATTOMPTS OF A CATALYTIC ALKYLATION 
YENILA BY ALCOHOLS LED TO NEGATIVE RESULTS /2,3,9,/ OR THERE WERE ACCOMPANIED 
S¥ INSIGNIFICANT YIELDS /10 - 13/. UPON THF ALKYLATION OF PHENOS, BY METHYL 
ALCOHCL BEST RESULTS WERE TUT ACTIVATED OXIDES OF ALUMINIUM /14/. 

BY US WAS STUDIED THE PROCESS OF THE ALKYLATTON CF PHENOL BY METHYL 
AVLOHOL CVER ACTIVATED ALUMOSILICTE BY A CATALYST + BY A CUMBRIN CLAY 
AT ATMOSPHERIC PRESSURE IN TH; USJA). APPARATUS, APPLIED UPON THE INVESTIGATION 
UF VAPOR PHASE CATALYTIC REACTIONS IN A CIRCULATING SYSTaM. AS A RESULT OF A 
REACTION WERE OBTATKED LICUTD AIT) GASEOUS PRODUCTS. A LIQUID TwO-LAYERED 
CONDENSATE, BESIDES NOT DI NOT DITO THE REACTION OF THE INITIAL 
SUBSTANCES, CONTAINED WAJ'SR, ALKYLATED PHEM'S Aa) NSUTRAL THE REACTION 
PRODICTS 

POR VY4SNENT THE RELATIONS OF THE YIELD OF TH ALKYLATED PROIMICTS FROM A 
TEMPERATURE, THE ACCUMULATION OF A CONDENSATE FOR ANALYSIS WERE CARRIED OUT BY 
‘HRER, TEMPRRATURES /350, 420 500G/, UPON THE MOLAR RATIO OF THE PHENCT, AND 
“EVHYL ALCOHOT, 1.. 6 FOR OF THE SUPPLY OF THE MIXTURE, 8 - 10 
A HOUR OVER 100 ML. A CATATYST. WITH AN INCREASE C? THE REACTION 
TEMPERATURE FROM 350 TO 50O@ THE YIELD OF THE ALKYLATED PHENOLS INCREASED 
ATMOST IN 2 GF A TIME AND AMOUNTED TO 68% /IN A ESTIMATE ON TEE PHENOLS OF A 
CONDENSATE/. THE OPTIMUM YTELD OF THE NEUTRAL PRODUCTS OF CONDENSATION WAS 
OBTAINED AT AND CONSTITULED 23% /TN A ESTIMATE ON ALL THE CCNDENSATE/. 


C. SAMB TEXT BY A HUMAN TRANSLATOR® 


The Ca’ ic lation of Phenol With 1 Alcohol 


Journal of General Chemistry, 
Vol 27, No 10, 1957, pp 2697-2699 


I. N. Samsonova 


A number of works have been devoted to the catelytic alkylation of 
phenol with secondary and tertiary alcohols (1-8). Attempts at catalytic 
alkylation of phenol with primary alcohols led to negative results (2,3,9), 
or resulted in insignificant yields (10-13). Yor the alkylation of phenol 
with methyl alcohol, the best results were obtained on activated aluminum 
oxide (14). 

We studied the process of the alkylation of phenol with methyl aleohol 
over an activated natural alumosilicate catalyst - gumbrin clay - at atmos- 
pheric pressure in the usual apparatus used for investigation of vapor 
phase catalytic reactions in s circulating system. As a result of the re- 
action, liquid and gaseous products were obtained. The liquid, a two-layered 
condensate, besides unreacted starting materials, contained water, alkylated 
phenols and inert reaction products. 

To determine the dependence of the yield of alkylated products on 
temperature, an-accumlation of condensate for analysis was produced at 
three temperatures (350, 420 and 500°), at a mole ratio of phenol and 
methy) alcohol of 1:6 and rate of feed of the starting mixture equal to 
8-10 ml. per hour over 100 ml. of catalyst. With an increase of reaction 
temperature from 350° to 500° the yield of alkylated phenols increased 
almost 3 times and reached 68% (based on the phenols of the con‘ensate). 
The optimm yield of inert condensation products was obtained at 420° 
and amounted to 23% (based on the entire condensate). . 


56002 O-60 (Face p. 91) 
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it a comparative study of the June 1959 and the January 1960 
tests, since in the meantime the linguistic formulation programing 
techniques, and lexical data had been significantly improved. Finally, 
again out of the 115,000 translated words, a total of 5,000 words were 
translated in January as a random text—that is, they were selected 
from the field of organic chemistry, but from a corpus which had not 
previously been the object of lexical or structural exploitation. 

Mr. Chairman, I would like to insert here that these texts from 
Russian, together with the random text from French in nuclear 
physics, are available for the committee’s inspection, and for inclu- 
sion in the record of these proceedings, along with a random segment 
jn organic chemistry from Russian to English. Since the quality 
and reliability of the Georgetown tests has been the subject of com- 
ments before this committee, may I make respectfully the following 
suggestion: ‘The committee might consider entrusting its technical 
consultant with the task of selecting 5,000, 10,000, 15,000 words in 
Russian in the field of organic chemistry. The keypunching should 
be done under the exclusive direction and supervision of the commit- 
tee’s technical consultant. 

The CuairMan. Professor, may I interrupt you? What do you 
mean by keypunching ? 

Mr. Dosrerr. Punching texts on cards, sir; perforating cards. 
Then when the text keypunched under the direction of the commit- 
tee’s consultant has been finished, it could be brought by the consultant 
himself to a place where we have access to a machine and the text 
would be run. I would suggest, further, that the output be placed in 
the hands of two or three chemists in academic institutions not con- 
nected in any way with machine translation or with any of the spon- 
sors—in other words, completely independent of any sponsorship or 
other association with machine translation. And I would finally sug- 
gest that the source of the text be not indicated, except to say that it 
was translated on a machine. I would be perfectly willing to stand 
by the judgment of such experts on such an experiment. 

(The results will be found in the appendix of the report “Research 
on Mechanical Translation (serial d) of the Committee on Science 
and Astronautics.’’) 

Dr. Dostrerr. I now return to my prepared statement. 

An additional 146,000 words, part of the total of 415,000 key- 
punched words mentioned earlier, has now been the basis of a supple- 
mentary total list which will be used for the further updating cf the 
dictionary, and it is suspected that in a matter of 2 months, the total 
number of lexical entries for the field of organic chemistry (Russian- 
English) will be of the order of 16,000 words, including general lan- 
guage words. We shall then run random texts in organic chemistry 
to determine the acceptability and smoothness of the output. If the 
dictionary of 16,000 words proves still markedly insufficient, our plan 
is to keypunch 200,000 words more for a further updating of the 

ctionary. 

2. In the field of French to English translation, the research led 
to the formulation of a generalized programing system called the 
simulated linguistic computer. This system is being used in part. in 
the conversion of the Russian-English “Georgetown Automatic 
Translation Technique.” From an IBM 705 program we are con- 
verting to an IBM 709 program. Assuming continuing support of 
the research, the next step would probably be to program for a still 
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more versatile instrument in a year or so, either an IBM or a mor 
promising machine. 

3. The plans for the coming year call for the keypunching of some 
25 million running words in five distinct disciplines and for the prep. 
aration therefrom of a separate technical lexicon for each of the sey. 
eral disciplines. It is expected that 5 million words or so of this key. 
punched material will be used for the development of lexical resources 
and that the balance of the machine output will serve through sys. 
tematic revision as the basis for the continued improvement of the 
structural and semantic operations, as well as for actual use as trans. 
lated materials after revision. 

And in this connection, Mr. Chairman, I would like to insert. the 
following statement and have it as a part of the record, in view of 
what has been said earlier in the hearings about the problem of mr. 
vision. In my years of experience in organizing and directing trans. 
lation staffs I have never known or heard of one, whether in the U.S, 
Government or in international agencies, that did not provide for re- 
vision. Yet we have been told that revision in the case of machine 
translation renders the machine output trivial. 

4. I should mention also that the machine translation studies from 
French into English were based on a 200,000 word corpus. About 
10,000 words of this text have been translated. Also, about 5,000 
words from nonprocessed texts in the field of nuclear physics were 
translated as a random test. 

The work on French to English has not had the same priority as 
the Russian to English. That is why more complete results cannot 
be reported. 

5. A preliminary analysis of converging features of syntax in sev- 
eral Slavic languages is being pursued. 

6. Preliminary work has been started looking to the translation 
of English into Arabic and Chinese. However, according to present 
indications, the forthcoming emphasis will be from Chinese into Eng- 
lish rather than from English into Chinese. 

7. Along with the experiment-focused research, a certain amount 
of theoretical investigation has taken place in the following fields: 

(a) An approach to the establishment of semantic categories; 

(6) The broadening of our syntactic analysis; and 

(c) The development of a program for the machine composition 
of chemical terms not found in the machine lexicon. 

Mr. Chairman, I have with me as supporting witnesses in case they 
are needed—three of my colleagues; Dr. Brown, Professor Zarechnak, 
and Mr. Toma. Thank you. 

The Cuarrman. Thank you very much, Professor Dostert and I 
think the record should show, too, that Mr. Houston, General Counsel 
of CIA, is here supporting as a witness. 

Mr. Borer. May I also mention Mr. J. Bagnall, the Chief of our 
Foreign Document Division, and Chairman of the Committee on 
Exploitation of Foreign Language Publications, a committee of the 
U.S. Intelligence Board, is also present with us. Mr. Paul W. 
Howerton, the project officer for our machine translation program, 
is absent, being overseas on temporary duty. 
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I appreciate your kind words at the beginning. I am not here 
posing as an MT expert, but as the responsible operating official having 
general supervision of this program. 

The Cuamman. I think your statement, Professor Dostert, was an 
excellent statement. It is most informative, and I get a rather op- 
timistic reaction from the character of the statement that you made 
{0 the committee. You have a protege up here on the committee, and 
in deference to the fact he is one of your former pupils—and I refer 
to Mr. King of Utah—I am going to recognize Mr. King to start 
with the questioning. [Laughter. ] 

Mr. Kina. The tables are turned, Professor. 

Mr. Dosrert. Yes, Mr. King. 

Mr. Kina. I would like to say for the record that I had the great 
privilege of being enrolled in a class of French literature under Pro- 
fessor Dostert during the academic year of 1933 and 1934. At that 
iime I considered you a giant, and in view of your brilliant perform- 
ances since that time I might add that time has further enlarged the 
dimensions of your stature. The world is very indebted, I feel, 
Professor Dostert, for the great contribution that you have made in 
this field. 

Mr. Dosrerr. You are very generous, Mr. King. 

Mr. Kina. I say this sincerely: While I was in your class, I learned 
there that languages were living things, rather than just dead bodies. 
You were able to capture the living essence, the spirit of the French 
language, and present it to the students. I was impressed with that. 
I believe you were in the room last Wednesday when I raised a ques- 
tion or two and expressed some skepticism over whether or not the 
machine could capture the living essence of a language because of the 
myriad of idiomatic constructions, and because the language reflects 
the culture and the thinking and personality of the persons who are 
speaking. I expressed some skepticism at that time. 

I had the opportunity of discussing this matter briefly with you 
in the corridor. You gave me some excellent thoughts at that time. 
I am wondering if you could briefly get into the record the thoughts 
that you gave me as to whether or not the machine is actually able to 
capture this personality of a particular language. 

fr. Dosterr. I think, Mr. Congressman, your skepticism is com- 
pletely founded, and I would go further and say that it is seldom 
that a human translator can capture the most elusive aspects of 
language. If you read the soliloquy of Hamlet in English and then 
in French, the translation may be there, but much of the spirit has 
gone. However, I would stress this: In the field of machine transla- 
tion we are not concerned with due deference to one of the things you 
mentioned, Mr. King, with “runs on banks,” or Dr. Oettinger’s “runs 
in stockings.” We will take care of that problem when we run into 
it in an actual text. 

We are concerned in the field of machine translation with what I 
have sometimes described as “inert language,” to contrast it with 
elusive or metaphorical lyrical language. We are not concerned with 
the translation of belles lettres. We are concerned with the transla- 
tion of scientific and technical literature, where the author speaks in 
& much more pedestrian, if I may say so—in a much more inert, de- 
scriptive, narrative pattern. 
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For that reason there is less of that elusive quality of language jp 
the kind of texts that we deal with than we would find in texts jp 
belles lettres. This is why, incidentally, from the beginning [ jn. 
sisted upon focusing our research on scientific literature, because jf 
someone comes to me with a given sentence which is ambiguous and 
says “How are you going to handle that?” my spontaneous answer 
is “Where did you find it?” And if he can not quote me a soure 
then I say—“I will meet the problem posed when I encounter it in q 
text.” I would, therefore, say, Mr. Chairman, and Mr. King, that 
you are quite correct in stressing the fact that there are many con. 
notative metaphorical, lyrical, elusive areas of language which for the 
calculable future the machine cannot even dream of approaching, 
But this does not negate the very practical objective of producing 
reasonably acceptable translation of scientific texts, which can be 
rapidly revised by human beings, and which will enhance the sourees 
of information for the scientific community. 

Mr. Kine. I appreciate your answer. It is completely lucid and 
corroborates the feeling I have had about this. Would we be correet 
in concluding, therefore, that we should always recognize the inherent 
limitations of this system? And recognizing those limitations, should 
we be grateful for what we have, accepting the advantages—and they 
are tremendous—but not trying to push this thing beyond its natural 
limits? 

Mr. Dosrterr. I agree, sir; we should not push it beyond its limits 
and I would say also in every area of scientific development and tech- 
nology, we do not expect to arrive at ultimate results at the outset, 
We go by cumulative, empirical-pragmatic experience. Lindbergh 
had to fly the Atlantic in 1927, in a monoplane, alone, and today we 

ut 90 or 100 people in jet planes. But that does not invalidate the 
importance of the 1927 flight. 

Mr. Kine. We had testimony before this subcommittee—I think it 
was Wednesday, and perhaps again Thursday—that other systems 
that are being worked out have been able to achieve—oh, I think 
the figure 50 and 55 percent accuracy was given. Are you prepared 
to state, under the results that you have had in the system worked out 
at Georgetown, whether your percentage is higher or lower than the 
figure given ? 

Mr. Dosrert. Mr. King, I would be very reticent to evaluate either 
the quality of our own work or that of other people in the field before 
this committee. I would rather leave that to the objective study of 
technical experts. I think it is exceedingly difficult—and I have been 
in the field of translation in a practical way in many instances—to 
evaluate the quality in terms of percentages. 

Suppose, for example, that you have a particular machine diction- 
ary gap in a given text. and that the word reoccurs 57 times in the 
course of 10 pages. Are you going to say that you are faced with 57 
errors, or are you going to say that you are faced with one error, due 
to one dictionary gap ? 

So, to me, this sort of evaluation is really meaningless, because I do 
not think it can be formulated on a scientific basis. 

Mr. Kina. I can well understand your modesty and your reluctance 
to institute comparisons. I think that is correct. But what you are 
saying is that your results have been very encouraging. 
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Mr. Dosrert. They are by no means perfect, and I think for the 
calculable future we shall need human revisers. We have planned, as 
a matter of fact, if the support of the project continues—this is always 
a big “if”’—to train some 50 translators from the Government to do 
revision work in terms of not only improving the output for use, but 
also of collecting information which will enable the researchers to im- 

rove the performance of the machine by plowing into the linguistic 
information and electronic programs, the results of the error patterns 

iscovered. 
we are able to do so, I should think within a reasonable time—I 
think the machine output will require less and less revision, but I can 
not foresee the day when it will not require any revision. 

Mr. Kine. Mr. Chairman, one final thought and I am through. | 

Professor Dostert invited this committee to put it to an objective 
test. I would be happy to have the committee accept his invitation, 
put it to a test, and let the tests be reflected in the record. 

The CuarrmaAn. Do you mean, Professor, that you would like the 
committee to be present ? 

Mr. Dostert. I would be most pleased, sir. 

The CuatmrMan. I will ask counsel to confer with you later, and 
those of us who can attend the demonstration will be glad to attend. 

Mr. Dosrerr. Thank you, sir. 

The Cuarrman. Any questions? 

Mr. Roush. 

Mr. Rovusu. No questions. 

The Cuarrman. I would like to ask several questions posed by our 
staff consultant, Colonel Dillon. ; 

For the past 2 days we have been discussing the diversity of the 
effort on the part of the Government in this field. Five different 
agencies are engaged in research on this same subject. I believe that 
this is spreading the Government-management talents a little too thin. 
CIA chairs one of the two committees on machine translation. 

Do you have any thoughts or recommendations on this multiplicity 
I address that to the CIA representative here. 

Mr. Borex. I would say, Mr. Chairman, that at the present state 
of development of this art, diversity ought to be encouraged rather 
than to the contrary. After all, these centers are not all working 
on the same problem. It is all translation, to be sure, but we have 
somebody working with German, somebody else with Russian, and also 
a variation in disciplines, shifting from physics to chemistry to bio- 
chemistry texts. I see no duplication here. The Soviet effort is of 
more recent origin and in fact going ahead much more rapidly. While 
I do not say we ought to copy them, they do have a centralized 
form of government, and if centralization were necessary at this stage 
of the game I think they would have invoked it. I think there are 
in excess of 80 institutions in the Soviet Union that have programs 
in this field, including a score or more institutes. 7 

The Cuarrman. What institutes are those ? 

Mr. Boren. These are Institutes of the Academy of Sciences of the 
U.S.S.R., as well as those of Union Republic Academies of Science. 
They have major programs, for example, at the Institute of Precise 
Mechanics (Moscow), the Electromodelling Laboratory of the All- 
Union Institute of Scientific and Technical Information (Moscow), 
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the Steklov Institute of Mathematics (Moscow), and the Experi- 
mental Laboratory for Machine Translation (Leningrad). Coordj- 
nation of our effort is very much in order, so that diversity can be 
planned rather than haphazard. This has been our objective to 
date. Centralization is needed when we speak of creating an MT 
facility. In operating such a facility we would not be primarily 
concerned with advancing the state of the art, but exploiting results 
of our research by producing translations. That is another matter 
altogether. 

The Cuatrman. Do you feel, Mr. Borel, that a planned diversity 
would eliminate waste, would bring us to the focal point of approach- 
ing perfection, or at least approaching satisfactory translation quicker 
and at an earlier time than we would if we proceeded along one road? 

Mr. Boren. That is correct. We have done some planning already. 
although at this stage it is based more on a matter of convincing some. 
body he ought to go down one road rather than another without hay- 
ing to say, “You shan’t do this or that.” 

The CHATRMAN. You would still have some agency like the National 
aero Foundation, in sort of overall directional control; wouldn't 
you! 

Mr. Borer. To coordinate the effort ; yes, I think that is imperative, 

The CuHatrman. How important do you place this project on a 
security basis to this country ? 

Mr. Borex. Well, that is a very difficult—— 

The CuairMAn. We will add on an intellectual basis, in addition 
to security, as security is first. 

Mr. Boret. I think, obviously, you are not creating something here 
that doesn’t exist. We in CIA feel we are rather well looked after 
in view of our manual translation program. What we want is to 
improve the speed of getting translations. As far as we are concerned, 
the amount of money we spend on an MT program designed to accom- 

lish this has to be balanced off against other important programs. 
Tadd not put the MT program at the head of the list. 

The Cuarrman. Of course, the amount of money you spend now on 
your translation, your manual translation, vould “a money largely 
saved in the event you have this mechanized translation. 

Mr. Borex. Ultimately you would make some savings, but the fact 
that you would have to do extensive review work for the next few 
years, and be limited to translations in a few disciplines would not 
result in immediate financial gains. In fact, you would have to have 
considerable outlay and increase your costs for the next few years 
before you would break even, let alone save money. 

The Cuarrman. It would be a long time before you save money, 
wouldn’t you say that? 

Mr. Borex. Yes, I would say, a long time. 

The Cuarmman. Mr. Bass. 

Mr. Bass. Mr. Borel, I would like to explore with you at greater 
length the present status of our program of translating Russian 
scientific information into English. I believe you said earlier that 
the Russian output of scientific information now is about 780 million 
words a year; is that correct ? 

Mr. Borex. That is correct, sir. 


‘ 
| 
| 
| 
: | 
3 
| 
i 
f 


RESEARCH ON MECHANICAL TRANSLATION 97 


Mr. Bass. And that of that total amount we are now translating 
into English 53 million words a year ? 

Mr. Boret. Yes. 

Mr. Bass. In other words, we are making use of about 7 percent, 
if you want to use that as a basis—7 percent of the total output. 

Mr. Borex. I would not say that we limit our use of it to 7 percent. 
True, we translate in full only some 7 percent. But we use much more 
of it. Many read the Russian from the original text and get some 
benefits from that. Much of available publications are at least 
scanned. 

Mr. Bass. Well, do we scan the entire 780 million, or whatever 
it is? 

Mr. Borst, This is a key question, sir. With your permission I 
would like to ask Mr. Bagnall to answer that. He is the head of our 
translation program—manual translation program. He might ad- 
dress himself to that question. 

The CuarmmMan. Where is Mr, Bagnall? Ask him, if he will, to 
stand up there. 

Will you give your full name to the reporter, Mr. Bagnall? 

Mr. Baenatu. John J. Bagnall. 

Of the 780 million words estimated of available output from the 
Soviet Union in scientific and technical fields, I would say that we 
scan half of that. 

Mr. Bass. So would it be fair to say we are making use of half of 
the Russian scientific output ? 

Mr. Bagnauu. Yes, sir. 

Mr. Bass. Do you do this on a selective basis? How do you pick 
that particular path ? 

Mr. Baenauy. We find the major value in the periodical literature. 
The largest part of the 780 million words estimated output, of course, 
is contained in the monographs or books, in scientific subjects. We 
find that ordinarily the books contain information in scientific fields 
which has been previously published in the periodical literature, and 
is subsequently collated in book form appearing several years later. 

Consequently, by scanning the periodical literature which is current 
and up to date, much of the book literature may be left out. 

Mr. Kine. Would the gentleman yield for just one question ? 

Mr. Bass. Yes. 

Mr. Kine. When you use the word “we,” are you referring to the 
CIA, or are you referring to all Government agencies concerned with 
the problem of translation ? 

Mr. Bacnatu. I am referring to the CIA, sir. 

Mr. Kine. You suggested that about half of this monumental total 
was being scanned by the CIA. Are there other agencies that would 
be scanning the other half ? 

Mr. Yes, sir. 

Mr. Kine. Then the full answer to the question of the gentleman 
from New Hampshire would be that through one agency or another it 
is all being scanned, or virtually all ? 

Mr. Baenatt. Virtually all; yes, sir. 

Mr. Mitter. Will the gentleman yield ? 

Mr. Bass. Yes, Mr. Miller. 
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Mr. Mutter. For clarification of this last statement, do you check 
with other agencies that may be doing translations, to insure that there 
isn’t a duplication of this scanning going on / 

Mr. Bacnatu. Yes, sir. We do. We have a committee to coordi- 
nate this work so that we do not duplicate. Weare constantly aware 
of what the other agencies are doing. 

Mr. Mitter. Let me try to simplify it in my own mind, not for you, 
because you gentlemen are way ahead of me, and I just want to make 
sure I am not left behind. You know, I am sure, an agency of this 
Government in Europe scans Russian publications, and does a very 
effective job. You also have at the Library of Congress a group 
scanning. Let’s say scanner or translator A in Europe picks up a 
Russian technical publication and finds something in it that might be 
interesting, or something that he catalogs for future reference in the 
ease. Have you any assurance that translator B, in the Library of 
Congress, in his turn, hasn’t picked up the same document and dupli- 
cated it? Then in making their reports we would say there were two 
documents indexed, rather than as a matter of fact a duplication. 
You can’t cut this duplication out, and in trying to do it you would be 
spending more time administratively trying to make sure that they 
weren't done, and it is perhaps much better to go ahead and duplicate 
it. Isn’tthat true? 

Mr. Bacnau. Well, we believe that in the coordinated program for 
U.S. Government agencies, particularly the intelligence community of 
course, that we are looking at and scanning virtually all of the Russian 
scientific literature that is made available to us; that is, that the 
Russians will allow. 

Mr. Mitter. Can’t you almost say you scan 100 percent of the scien- 
tific literature that is made available to us, or that we can get our 
hands on ? 

Mr. Yes, sir. 

Mr. Mruixer. That doesn’t mean you translate everything, but as I 
understand it, these people are well skilled and know what they are 
doing. If they look at an article and think it is worthwhile they 
translate it; if not, they index it so that at some time in the future 
somebody is liable to come down and say “I want to get all the litera- 
on this subject.” They can go in and say “There is this piece that 
hasn’t been translated, but if you want it we will translate it; if you 
don’t want it we won't translate it.” Isn’t that the way it works? 

Mr. Baenaty. That is right, sir. 

Mr. Bass. To put it another way, Mr. Bagnall, you feel we are fully 
exploiting the availability of all this Russian scientific and tech- 
nological information that is coming to us. Is that a fair statement 
in your opinion ? 

Mr. Baenatu. Let me say that we are aware of and scan the litera- 
ture that is made available to us. For complete analysis of much of 
this literature, full translations would be required. Full transla- 
lations take considerable time, as you well realize, and of course if the 
translations could be made available much more rapidly for analysis 
in connection with current events and developments, it would be 
advantageous. 
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The Cuarrman. Let me take this opportunity to compliment you 
for the progress and improvement you are making. 

We had a hearing in this committee about a year ago on this same 
subject, and we didn’t get nearly as good a report as you are able to 
ive us today. Thank you for the interruption, Mr. Bass. 

* Mr. Bass. I have one or two more questions. 

The CuatrmMan. Go right ahead. I didn’t mean to interrupt you. 

Mr. Bass. I think this is extremely important. Is this information 
available to other Government agencies, and to scientists and tech- 
nicians working in private industry / 

Mr. Bacnaty. All of the information, shall we say extracted or 
translated from the Russian scientific literature is available to other 
Government agencies and to the general public. For example, the 

articular output of CIA is a scientific information report summariz- 
ing the highlights of developments in the Russian scientific literature, 
and is issued by the Office of Technical Services, Department of 
Commerce, on subscription to the general public. 

In addition, all translations that are eee are made available 
through the Office of Technical Services to the general public. 

Mr. Bass. Mr. Borel has just handed me, I take it, such a publica- 
tion that you are speaking about. 

Mr. BaGnau. That is correct, sir. 

Mr. Boret. Samples of several, sir. 

Mr. Bass. Samples of several. 

Mr. Borex. Including the one specifically mentioned. 

Mr. Bass. One other question, Mr. Bagnall. Would you comment 
as to the time lag between the time the Russian text reaches this coun- 
try and the time it is abstracted or translated and made available to 
anyone in the scientific community? Is that a problem? 

Mr. BaGnaui. Yes, it is a problem. In general, the highlights of 
important items may be abstracted and made available within 4 to 6 
weeks after date of publication. However, translations of important 
articles will be made available, depending on their length, of course, 
at considerably later dates, running from 1 month to perhaps 6 
months for a book say of three or four hundred pages. 

Mr. Bass. Now, if I ama physicist working for IBM, or some other 
concern, is it fair to say I can go to the CIA and get all the available 
information you have abstracted or translated on that particular 
subject ? 

Mr. Baenaty. Actually, the CLA, of course, does not have a func- 
tion of distributing to the general public. We rely on the distribu- 
tion mechanism available in the Office of Technical Services, Depart- 
ment of Commerce, so that as a physicist he may go to the Department 
of Commerce and obtain there copies of what has been produced. 

Mr. Bass. Thank you. 

Your testimony along this line has been most encouraging to me, 
because I had heard contrary reports, and I think it is vitally im- 
portant that we know what the Russians are up to in the field of science 
and technology. And from what you have told us, we are pretty 
much up on that. 

Mr. Yes, sir. 
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The Cuatrman. As a matter of fact, it is a great improvement over 
the report that we received about a year ago when we recommended 
that this program be extended and be made more complete. We are 
pleased, too, all of us, I believe, over this. 

Mr. King. May I have one more question ? 

Mr. Mixuier. May I have a question ? 

The Cuatrman. We are running late. I am going to put my ques. 
tions in the record. But if somebody has some further questions— 
Mr. Miller. 

Mr. Mitter. For instance, the Army does some translation. Do you 
coordinate with an institution like this? 

Mr. Baenat. Yes, sir. 

Mr. Miter. They do translating there? 

Mr. Baenati. That is correct, and we maintain records and ¢o- 
ordinate with them. 

The Cuarrman. Mr. King has a question. 

Mr. Kine. It has been testified that we are translating roughly 7 
percent of the available Russian scientific and technical output and 
scanning practically all of it. The question I would like to ask is: 
Do you know how much of our technical literature is being translated 
into Russian, and do we do any of that ourselves, or do we leave that 
entirely up to the Russians to do their own translating? [Laughter.] 

Mr. Baenatu. The latter is correct; that is, we leave it up to the 
Russians. They have an enormous organization in Moscow. 

The Cuatrman. One is in Washington, isn’t it? [Laughter.] 

Mr. Baenauu. For procurement, probably, yes. For translation 
and abstracting in Moscow they have an organization of some 2,000 
translators and some 20,000-odd abstractors used part or full time for 
just our scientific and technical literature. What percentage of our 
literature is translated into Russian I could not say offhand. 

Mr. Kine. Of course, that figure wouldn’t be quite as significant as 
the 7-percent figure, because there are more Russians who can read 
English than there are Americans who can read Russian. 

Mr. Baenatu. That is right, many more, sir. 

The Cuarrman. May I ask you one question, and then I am going 
to ask, if I may, to submit a number of questions prepared by coun- 
sel to you gentlemen, which I would be pleased if you would answer 
briefly so we can have it a part of the record. My one question is 
this: Are we doing enough in this respect, and what would you recom- 
mend briefly that we do to speed up this program if you think it is 
that important ? 

I will ask any one of you gentlemen. 

Mr. Boret. Mr. Bagnall is a little reluctant to speak to this point 
because this MT program is designed to put him out of business. 
[ Laughter. ] 

That is a tough question—what more can we do? I think we are, 
in a sense, at a turning point, in that both the Georgetown proposal 
and the IBM proposal envisage taking a step which hasn’t in the past 
been tried. In other words, large sums are now sought, not to do 
further research, but to see how we can apply what has been learned 
to date. We can do more by making such tests possible. As far as 
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evidence of progress to date, I respectfully refer you to samples of 
machine translations from Russian to English and from French to 
English placed into the record by Professor Dostert. 

The CuatrmMan. Have we reached the point of applying the re- 

rch? 

Borex. Yes; at least in some areas. And it isa question of 
business judgment, whether you have reached the point where 
progress made to date has been sufficiently good to justify the expendi- 
ture of a lot of money. As I said earlier, this matter 1s under study 
right now in our Agency, and I can report no conclusion to date. 
My own opinion is that a central facility of some kind is in order, and 
that the problems other than translation itself are so significant, that 
planning for an early beginning of such a facility is in order. To- 
day would not be too early to start coping with the entire problem. 

The CuairMAn. Professor Dostert has a different opinion. Do you, 
Professor ¢ 

Mr. Dosrert. Well, not really, Mr. Chairman. What we have done 
now is to move to a certain point in one discipline, organic chemistry. 
All that I suggest is that we try to utilize what we have learned in one 
discipline and apply it to the development of resources in other 
disciplines. 

The Cuarrman. Then have we reached the point of application of 
even that one discipline ? 

Mr. Doster. Yes, sir, I think we have learned, for example, now 
that in order to arrive at a fairly acceptable translation in organic 
chemistry a lexicon of the magnitude of 16,000 to 20,000 words is 
likely to prove adequate. When we go to a new discipline we will 
keypunch half a million words. At the beginning we didn’t know 
how many words we would have to keypunch to get 15 to 20 thousand 
lexical entries. Now we know that. If we move into geophysics, 
the first thing we would do would be to —— 500,000 words and 
see what we get out of it in terms of a word list. Then we compare 
the two word lists, and see what words are common to the two dis- 
ciplines and what words are specific to each discipline, and so on 
down the line with other disciplines. 

Now, this is still research, Mr. Chairman. This is not mass produc- 
tion of translation. The translation that will come out from the 
machine will be used as the basis for continuing research, but the prod- 
uct will have some practical value. 

The Cuarrman. Sure, you would apply it as of today, but continue 
your research. 

Mr. Dostert. That is right. 

The Cuarrman. Now, let me see if I have this correct. You have 
been keypunching on cards, words from Russian texts and then run- 
ning these cards through an IBM 705 or 709 computer. The computer 
has a glossary of 10,800 entries. It also finds the matching words, ap- 
plies several rules of syntax, and prints out the English equivalent. 
How does this empirical approach differ from the program presented 
yesterday by Dr. Cannon of the Bureau of Standards? 
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Mr. Dosrerr. I think I can answer this question by making the fol- 
lowing points: 

(1) The Georgetown machine translation research antedates the 
work undertaken at the National Bureau of Standards in 1958. 

(2) No information is available to me as to the number of running 
words that the Bureau of Standards has keypunched—in what fields, 
and what is the resultant total of glossary or dictionary entries. 

(3) All machine translation, regardless of technical differences in 
approach, must proceed with word matching and various operations 
based on grammatical and syntactic information, and thence with the 
synthesis for printout into English. There is, insofar as we are in- 
formed, a certain amount of overlapping between certain aspects of 
the syntactic operations of the Bureau of Standards and those of the 
Georgetown automatic translation. 

(4) More ample information and results from the National Bureau 
of Standards effort would have to be available to give an explicit 
answer to your question. 

The CuatrMan. On the bottom of page 3 of your statement you 
mentioned a great deal of unproductive controversy in your groups. 
Would you tell the committee what the differences of opinions were 
that caused these controversies / 

Mr. Dosrert. (1) One group came to the early conclusion that it had 
a very powerful technique which would give broad and significant 
results. Unfortunately, while it minimized the value of some of the 
techniques of other groups, it did not report its technique to the point 
where an objective evaluation could be made. The members of the 
group are no longer affiliated with the present project. 

(2) Another group focused on syntactic analysis and was critical of 
the text-focused approach. Some of its members are now working for 
another center and, notwithstanding the earlier difficulties, there seems 
to be a measure of parallelism between the work done by them now 
and the past and current Georgetown procedures. 

(3) Moreover, Mr. Chairman, it 1s normal, that in a field of in- 
vestigation there should be some divergence of views. I was perhaps 
overcritical in using the words “unproductive controversies,” and you 
will recall that the last part of my statement on that subject pointed 
to some productive results. 

The Cuktiavise On page 5 you mention the plans for the coming 
year. I understand you plan to have most of the keypunching done in 
Germany and the cards then returned for IBM runs. What do you 
expect to gain from this mass production for research purposes? Are 
you semioperational ? 

Mr. Dosrert. What we expect to gain from the materials which we 
plan to keypunch are the following: 

(1) A firm basis for determining the extent of running texts re- 
quired to yield a working specialized dictionary for a given scientific 
discipline; 

(2) The development of five or six additional specialized diction- 
aries for as many disciplines ; 
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3) A basis for the comparison of general language content of 
scientific texts, general technical dictionary words, and specifie tech- 
nical terms 5 

(4) The establishment. of equivalent technical terminologies in 
English on the basis of either existing English translations or the 
Jexicographical assistance of experts in the several disciplines in- 
volved; and 

(5) A systematic gathering of data with respect to flaws and in- 

adequacies in the translation output to permit the continuing improve- 
ment. of the linguistic formulation and of certain aspects of the pro- 
graming technique. 
If by semioperational you mean that part of the output can be 
utilized by users after revision, then the answer for part of this 
material, and in the discipline involved, would have to be “Yes”—in 
the fields indicated we are moving toward a semioperational stage by 
the end of the fiscal vear 1961. 

The CuarrMaAn. Throughout the machine translation family there 
is a slight fear that research funds and work will be curtailed or 
affected if you or any other organization considers that the product 
isready for production. Would you care to comment on that ¢ 

Mr. Dosrert. The vastness of the work which remains to be done in 
machine translation and the complexity of the task should give no 
basis for any concern over the necessity for a number of years of con- 
tinuing research. 

Indeed, in our own project there will continue to be a major focus 
on research efforts. Moreover, the fact that some measure of ac- 
ceptability and volume is achieved in the case of two languages in 
certain disciplines only should be a factor to encourage the continu- 
ing suppport of research on the basis of preliminary significant re- 
sults, rather than curtailing it. 

The CrairMan. Professor Dostert, do you have any comments on 
the overall field of language science in the United States? Are we 
in Government doing enough ¢ 

Mr. Dosrerr. Mr. Chairman, the magnitude of the problem of the 
years ahead in the field of language communication in its various as- 
pects is such that I believe the time is at hand to study the advisa- 
bility of establishing a national institute of language science some- 
what along the lines of the present National Institutes of Health. I 
would envisage that such an institute would embrace several basic 
areas: Mechanical linguistics (including machine translation), peda- 
gogical linguistics, cultural linguistics, psychological linguistics, and 
lexicography. I would ask permission to submit to your commit- 
tee an outline and table of organization, together with a summary 
of steps in the same fields now being developed or under considera- 


tion in the U.S.S.R. This may be entered as part of the proceed- 
ings. 


1. Language as an instrument of persuasion—It can be assumed that the 
Soviet Union will place increasing emphasis on language as a means of com- 
munication for more effective persuasion. 

2. Magnitude of U.S.S.R. activities in field.—The magnitude of the activi- 
ties now being carried out in Russia in the field of linguistics and related sciences 
is far in excess of the U.S. effort. 
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3. Machine translation effort in U.S.S.R.—In the field of machine transla. 
‘tion alone, evidence indicates that between 700 and 1,000 specialists in lan. 
guages, linguistics, mathematics, computation techniques, and engineering are 
at work in the Soviet Union. There are indications that the work on machine 
translation is focused on approximately 50 languages and that in respect to some 
of them, their system is operative. 

4. Extent of machine translation research in United States.—In contrast, there 
probably are 50 full-time specialists working on machine translation in the United 
States, and only 5 languages are involved in the total effort, some of these on g 
limited scale. 

5. Coordination in U.S.S.R.—A basic advantage which helps Soviet activities 
is effective coordination of the diversified aspects of the total effort. 

6. Magnitude of lag.—Unless steps are taken with a sense of urgency to achieve 
coordination, stimulation, and development of the U.S. effort, we shall be lagging 
further and further behind in this increasingly significant field. 

7. Planned additional activities in U.S.S.R.—A recently published article by 
Andreyev of the U.S.S.R. Academy of Science, of which he sent a copy to the 
undersigned, outlines the several areas in the general field of the science of 
language in which specific initiatives must be undertaken or broadened in the 
Soviet Union. (See attachment C.) 

8. U.S. coordination through a National Academy of Language Science.—To 
coordinate and broaden U.S. work in this field, a National Academy of Language 
Science should be established as a distinct Government-sponsored agency, 
It could be placed under the overall executive authority, either of the Presi- 
dent’s Executive Offices, the Department of Health, Education, and Welfare, 
or the National Academy of Sciences. Its structure could be broadly similar 
to that of the National Institutes of Health. Attachment A presents a chart 
~ the proposed organization and activities of a National Academy of Language 

cience. 


| 
| 
; 
ra 
| 
| 
: 
| 
| 
| 
a 


RESEARCH ON MECHANICAL TRANSLATION 


— 


peoiqy 
pus 
Yoivesoy eFenZuvy *t 
go smeTQoig Tensy ssey Te1Q * ty 
go go go go go go 
uoTSTATT uoTSTAT UOTSTATT UOTSTATT 
TA NOISTAIC A NOISIAICA AI NOISIAICA III NOISIAIC II NOISIAIC I NOISTAIC 
| | | | 


quomzredsag sat 


| 
| 
| | 
| 
i 
| 
ae 
| 
| 
| | 
| 
| 
| 


106 RESEARCH ON MECHANICAL TRANSLATION 


ATTACHMENT B 
SuMMARY ExPLANATION OF ORGANIZATION CHART 


J. The NALS would have as its primary mission the coordination of the 
national effort in the various areas of the science of language, both practical 
and theoretical, on the basis of the five substantive divisions given in the chart, 
plus one service division. 

2. In certain fields it would have an operating responsibility whenever this 
procedure would be deemed more effective than the assignment of specific research 
and development projects to selected academic institutions. 

3. It would provide for interfellowships to permit the assignment for 1 year 
or more of top specialists who would be working with the NALS in specific 
research or development projects. 

4. The Advisory Board would be made up of five recognized scholars or author. 
ities in the five substantive fields who would function as regular consultants, 

5. The Executive Board would be made up of the divisional directors under 
the chairmanship of the Executive Director. 

6. A Supervisory Board could be made up of five executives from academic 
institutions or learned associations who would be appointed on a rotating basis, 
The Supervisory Board would also include two members from the Government. 

7. It is estimated that when in ful] operation the NALS would have a staff 
of from 300 to 500 persons. The preliminary budgetary estimate is of the 
order of $15 million per annum, half of which would be for internal operations, 
and the remainder for external projects, based on contractual grants. 

8. The first measure to be taken if the project meets with approval would be 
the appointment of a small planning staff under the direction of the future 
Executive Director, who would review and present a more complete and rigorous 
proposal for legislative consideration. The planning staff would arrange to 
have the advice of five recognized authorities specialized in diverse areas within 
the purview of the proposed Academy, who presumably would become the future 
Advisory Board. 


ATTACHMENT © 


The paper received from Andreyev of the U.S.S.R. Academy of Sciences entitled 
“Basic Problems in Applied Linguistics” makes the following points: 

1. Increasing importance of linguistics —The increasing importance of lan- 
guage communication necessitates increasing attention both to theory as well as 
practical aspects of linguistics or the science of language. Although applied 
linguistics in the Soviet Union has made great strides, it is still lagging behind 
other sciences. 

2. Development of alphabets.—Soviet linguists have succeeded in developing 
alphabets for the people of the U.S.S.R. who have no written language. The 
same effort is being made by linguists in southwest China. Africa has a long 
standing problem in this field. 

3. Language teaching.—The importance of linguistics in language teaching 
methodology is increasingly recognized. Morphological and syntactic portions 
of algorithms developed for machine translation can be successfully used in 
language instruction. 

4. Transcription and transliteration.—A uniform transcription and_trans- 
literation system must be developed. 

5. Emerging scientific terminology—The increasing growth of scientific 
terminology through creation of new words for new concepts makes it necessary 
for a major effort to be made in lexicology, both in terms of creating new words 
and standardizing terminologies of diverse disciplines. 

6. Translation of scientific terts—Translation of scientific texts is subject to 
linguistic laws rather than being a problem of aesthetics. 

7. Shorthand.—Improved stenographic systems can be developed on the basis 
of new data derived from information theory. 

8. Speech defects.—Linguists and psychologists can contribute to the improve 
ment of speech defects. 

9. Orthoepy.—The importance of developing orthoepy on the basis of the 
methodology of linguistics for greater facility of communication is essential. 
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10. Communication channels.—It is necessary to insure their effectiveness on 
the basis of rigorous linguistic analysis of the accuracy of the messages carried. 
11. Compression of speech—Greater economy in communication system can 
be achieved through this means. 
--$2, Compression of written speech.—At present, lexical coding is being de- 


veloped to provide compression of telegraphic codes to about one-fourth their 
original length. The importance of written language compression is obvious 
for the field of machine translation and data processing. 

13. Transposition of codes.—The use of computing machines should be in- 
creased for the transposition of codes and for extracting information in lin- 


guistic form. 
14. Scanning and listening devices.—Research should be accelerated for the 


development of electronic scanners, both for visual and auditory perception. 

15. Speech synthesis.—Greater emphasis should be given to speech synthesis 
in view of its importance for oral machine translation. 

16. Machine translation.—Even though the present efforts are increasingly 
important, machine translation must make greater strides not only for its prac- 
tical value, but for the information it will yield on the overall theory of 
language. 

The Crairman. Efforts in machine translation research seem so far 
to have been aimed at translation into English. Is there any thought 
being given to machine translation from English into other lan- 
guages, and if so, for what purpose ? sae 

Mr. Dosrertr. We have, as I stated in my formal remarks, initiated 
preliminary research for the translation of English into Arabic and 
English into Chinese. The ultimate objective of “reverse English” 
machine translation would be to facilitate the diffusion of basic 
reference materials in science and technology produced in English for 
various areas of the world now emerging to a higher level of tech- 
nological development. 

The Cuarrman. Each group doing research in this field builds its 
own dictionary. Why can’t the research work on dictionaries be 
exchanged ? 

Mr. Dosrerr. We should perhaps make a distinction between “word 
lists” and “dictionaries.” ‘There seems to be no complication in pro- 
viding for the reciprocal exchange of word lists developed by the 
various groups. It seems to me also that some coordination and 
standardization could be arrived at in regard to keypunching pro- 
cedures and format. However, if by “dictionary” we mean the lexi- 
cal materials plus the codes for analytical operations, this would seem 
less feasible, since the codes are devised on the basis of the particular 
philosophy and techniques pursued by the several groups. 

The alee Professor Dostert, could you tell us what are some 
of the major handicaps under which you have been working ? 

Mr. Dosrerr. The major handicaps which we have encountered 
have been the precariousness of the financial support. As you well 
know, the grants are made on a year-to-year basis and sometimes as 
late as the 1st of June one does not have any firm commitment for 
the next fiscal year. This makes it extremely difficult to recruit the 
type of professional talents required for this work and to plan for 
work ahead on the basis of security and continuity. As an example, 
I would cite that in order to assure continuity of our work during 
the summer and to assure securing an expert consultant in the field 
of chemistry in which we have specialized, I have had to make com- 
mitments without being assured definitely that the support of the 
work would be continued. I would urge respectfully, Mr. Chairman, 
that very careful consideration be given to the handicap which this 
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procedure causes for scientific research, which must perforce con- 
tinue over a period of years. 

The Cuatmrman. Mr. Borel, do you believe that the time is at hand 
to start planning for a computer designed only for language 
translation ? 

Mr. Bore. In my prepared statement I indicated that it was timely 
to plan for a central machine translation facility, but that programs 
should also continue to be developed for MT by general-purpose 
computers. The central facility would take on volume translation, 
Hence its economics would justify expenditures entailed in develop- 
ing any special-purpose equipment contributing to an effective ma- 
chine translation system. In terms of the translation problem, the 
computer, whether special or general purpose, performs only a limited 
number of the various operations which must be performed by a com- 
plete translation system. The virtue of procuring a special-purpose 
machine lies in not buying hardware not needed for the special task. 
The disadvantage lies in its being a machine of use to do only trans- 
lation work. But if it is fully busy doing translation work, this point 
is irrelevant. The general-purpose computer, on the other hand, is 
built primarily to do numerical computation, and hence is not neces- 
sarily efficient in handling language data. Nevertheless, many who 
have general-purpose computers would find it useful to be able to 
use them to do translations in addition to their many other uses, 
Moreover, general-purpose computers are available now, whereas the 
production model of any special-purpose computer would be at least 
two years away. In short, the time is at hand to start planning fora 
machine translation facility. Work done by the U.S. Air Force and 
IBM in the direction of developing an overall system is commend- 
able, but should not lead to a reduction of the valuable research done 
by others. 

The CuarrmMan. I would like to ask Mr. Borel to comment on the 
CIA’s concept of the future of machine translation, relative to a 
center operated by the Government on behalf of the Government. 
How would the National Academy of Language Science, referred to 
by Professor Dostert, fit into the scheme of things of the future as you 
see it? 

Mr. Borex. Our thinking on this is quite tentative, Mr. Chairman. 
We are not so concerned about who would operate an automatic 
translation center, or where it would be located, but are concerned 
that the job be done by somebody. Today, insofar as the U.S. Gov- 
ernment has a peed | translation center, CIA has it, and operates 
it as a service of common concern. Still, the bulk of our requirements 
is for translation of unclassified material and is contracted for out- 
side the Agency. Hence, there is no compelling reason for an auto- 
matic translation facility to be an integral part of CIA. 

Professor Dostert’s plan for a National Academy of Language Sci- 
ence goes well beyond the concept of a translation center. Most of 
what he has outlined as the function of Division I of his six divisions, 
ie., the Division of Mechanical Linguistics, would be the task of a 
facility primarily concerned with producing translations. The crea- 
tion of such a facility should not depend upon the acceptance of the 
whole scheme. 
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We should, therefore, make a very clear distinction in our thinkin 
between the function of conducting research in the field of mechanica 
linguistics and the function of producing translations by machine. 
The former function, that is, the lion’s share of the National Academ 
concept, would be properly placed in the Department of Health, Edu- 
cation, and Welfare. The production function should in my opinion 
be separate and operated by a private contractor on behalf of the 
Government. Its ties with CIA and the intelligence community 
should be close in order to foster essential cooperation between the 
facility personnel, the human translators of the intelligence com- 
munity, and the intelligence consumers of the output. 

The Cuatrman. I certainly want to thank you gentlemen for com- 
ing here. Professor Dostert, Mr. Borel, and all of you representing 
both Georgetown and the CIA. We appreciate it very much. 

We have two very important witnesses, so I am told by counsel, 
from Baird-Atomic, Inc. 

We would be happy to have you gentlemen stay here. If you have 
to leave because of pressing engagements, it is understood by the 
committee. 

Mr. Dosrert. Thank you, Mr. Chairman. 

Mr. Boret. Thank you, Mr. Chairman. 

The Cuarrman. At this time we want to thank you very much. 

We call now as witnesses Dr. Walter S. Baird, chairman of the 
board, and Dr. Walter C. Driscoll, vice president for research and 
engineering, and Mr. John A. Fitzmaurice, project director, all from 
the Baird-Atomic, Inc. 

If you will all have a seat there, we will hear your statements. I 
understand there are two prepared statements, following which we 
will then ask the questions. I can tell you this: by the questions you 
have already heard, we are interested in this program. 

We will first recognize the chairman of the board of Baird-Atomic, 
Inc. 

Dr. Batrpv. Dr. Driscoll and Mr. Fitzmaurice will discuss the Baird- 
Atomic program, so I will turn things over to Dr. Driscoll. 

The All right. 


STATEMENT OF DR. WALTER S. BAIRD,“ CHAIRMAN OF THE 
BOARD; DR. WALTER C. DRISCOLL,’ VICE PRESIDENT FOR RE- 
SEARCH AND ENGINEERING; AND JOHN A. FITZMAURICE,"* 
PROJECT DIRECTOR OF BAIRD-ATOMIC, INC. 


Dr. Driscoti. I think it is appropriate to thank the chairman for 
the opportunity to come before you and speak of our optical print 
reader. I would also like to thank the Air Force, through the Rome 


“Dr, Baird is a graduate of Johns Hopkins University, having received a Ph. D. in 
electrical engineering in 1936. His B.S. degree was granted in 1930 by St. Johns College. 

From 1934 to 1935 Dr. Baird served as an instructor at Harvard University, and from 
1935 to 1936 he was employed as physicist in the Watertown Arsenal. 

Baird Associates was founded in 1936 with Dr. Baird as cofounder, and for a period 
of 20 years Dr. Baird served as company president. In 1956, with the merger of Baird 
Associates, Inc., and Atomic Instruments Co., Dr. Baird became president of the newly 
formed organization, Baird-Atomic, serving in that capacity until his appointment in 1957 
as chairman of the board. 

” Dr. Driscoll majored in physics at Boston College receiving the B.S. degree in 1938 


and the M.S. degree in 1940. In 1951 he received the Ph. D, in engineering from Brown 
University. 
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Air Development Group for the opportunity to pursue the work and 
for the funding for this particular project. 

The linguistic and data-processing aspects of mechanical translation 
will be covered by other witnesses. 

I will confine this statement to a description of the Baird-Atomic 
optical print reader—a device which will read about 1,000 characters, 
of any language, in 1 second. 

For more than 5 years Baird-Atomic, with Air Force, Navy, Baird- 
Atomic, and other industrial support, has been engaged in research 
and development, relating to automatic recognition of two-dimensional 
patterns; i.e., the interpretation of photographs, and printed or writ- 
ten text. 

Our approach, for solving specific problems of this type, has been 
to apply optical principles to alphanumeric recognition. We feel that 
we have been quite successful. 

From the reputation gained by this experience, we were engaged by 
the Air Force, through the Griffiss Air Force Base in Rome, N.Y. 
in May 1958, to study the problem and to provide a prototype of 
limited versatility to demonstrate the feasibility of our solution. 

Following this initial effort, a second contract, AF30(602)-1828 
for $380,695 was received from the Air Force to construct a fune- 
tional optical reader as an input to the Russian translator being con- 
structed by International Business Machines for the Air Force. Plans 
call for the reader to be completed and operational in September of 
this year. 

The Baird-Atomic reader, now under construction, is capable of 
recognizing and distinguishing a large number of different type fonts 
in various alphabets, including at least English, Cyrillic, and Greek 
characters. This reader has recognition capability independent of 
the spacing between lines, the position of the text on the printed page, 
and the occurrence of randomly interspersed graphic material. Fur- 
thermore, the present design objective is to provide an instantaneous 
reading rate of about 1,000 characters per second. This speed is ac- 
complished by an optical system which permits comparison of an 
unknown character, printed letter or number simultaneously with each 
of a large set of reference characters. The set of reference characters, 


Following his graduation from Boston College, Dr, Driscoll was employed from 1940 to 
1946 by the U.S. Government in Washington, D.C., where he was placed in charge of the 
FBI Laboratories. From 1951 to 1954 Dr. Driscoll was engaged in advanced graduate 
study at Brown University, during which time he served as research associate in the 
engineering department. 

In 1951, Dr. Driscoll was again employed by the U.S. Government, this time on special 
assignment in the Department of Defense. He served in that capacity until 1954 when 
he joined Baird Associates as assistant director of research. In 1956, with the merger 
of Baird Associates and Atomic Instruments Co., Dr. Driscoll became director of research 
for the newly formed organization, Baird-Atomic, Inc. 

In 1957, with the incorporation of Baird-Atomic, Dr. Driscoll was named vice president 
in charge of research. 

% John A. Fitzmaurice attended the Illinois Institute of Technology where he was 
awarded the bachelor of science degree in physics in 1951. He has also pareese graduate 
work in physics at the University of Illinois, the Case Institute of Technology, and the 
University of Maryland. He has also studied at the graduate school of business adminis 
tration, Northwestern University. 

From 1943 to 1946, Mr. Fitzmaurice served with the Marine Corps doing combat intel- 
ligenee work. At the National Bureau of Standards where he was employed from 1951 to 
1953, Mr. Fitzmaurice was assigned to the mechanical instruments section involved in 
studies of fluid flow and thermodynamics as applied to aircraft oxygen equipment. He 
was later employed at the ARO Equipment Corp. where he was concerned with the devel- 
opment of oximeters based on evaporated semiconductor films. 

Mr. Fitzmaurice joined Baird-Atomic in March 1954. He has been engaged in pattern 
recognition studies with application to missile guidance, aerial reconnaissance, and 
alphanumeric character recognition, 
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called the system’s memory, are photographic masks or optical aper- 
tures that can be rapidly and inexpensively prepared and replaced 
should a change in the language or type font be desirable. Baird- 
Atomic simultaneous comparison, recognition, and identification of 
characters reduces excessive mechanical motions and eliminates time- 
consuming electrical scanning procedures which have been proposed 
and utilized previously by others. Mechanical and electrical scanning 
procedures usually require ancillary computers of considerable com- 
plexity and cost. ; 

The computer requirements for such a scanning reader appear to be 
particularly severe when Cyrillic, English, and Greek characters are 
intermixed as is the case of Russian technical literature. For these 
reasons Baird-Atomic has employed a simple optical approach to the 

roblem. 

Because of a multiplicity of related and unrelated problems associ- 
ated with automatic print reading, Baird-Atomic has limited its ef- 
forts on the present contract to reading text which is first prepared 
on transparent 70 millimeter film. The developed film negative is 
read by the machine. This approach was taken for several reasons 
which relate principally to the manner in which technical papers are 
presented by their authors. They usually include pictures, graphs, 
and multilined mathematical equations, and to cope with these ran- 
dom inserts when word translation is the prime interest, would in- 
crease the cost of the development considerably. Furthermore, al- 
though the problems are not insurmountable, the availability in time 
of such a functional system would be questionable. 

The decision to work with photographic transparent text does not, 
in any way, preclude the possibility of reading opaque material di- 
rectly. In fact, Baird-Atomic is presently considering the elements 
of opaque text reading independently of the program being discussed 
here. 

Although we would like to discuss the details of our technical ap- 
proach, we probably will not have that opportunity at this time, al- 
though if you care to, we will. 

Perhaps a word or two is in order regarding the economic payoff 
that may be realized by automatic reading in preference to retyping 
and feeding the translator in some alternate manner. 

A reasonable total cost for typing Russian text on a typewriter and 
simultaneously punching a paper tape to feed the translator is about 
one-fourth cent per word. Our present estimate for automatic read- 
ing is about one-twentieth cent per word. This, we believe, is a con- 
servative estimate. It includes all operating costs and it assumes a 
complete amortization of both the development and production costs 
in 10,000 hours of use. 

On the following page I have prepared a brief technical discussion 
of our system. I think when you bring up technical questions regard- 
ing the system, that I would like to suggest that Mr. Fitzmaurice, 
who is at my right, answer them. He is our senior project engineer 
for this system. 

_ This discussion is intended to be a layman’s description of the phys- 
ical principles underlying the Baird-Atomic method of optical char- 
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acter recognition. It is presented to familiarize you with the sim, 
plicity and the ultimate potential of the technique. 

Figure 5 is a photograph of a typical optical mask or array of aper- 
tures which includes characters of a particular type font. 


FIGurRE 5 


Figure 6 shows a simplified optical layout of some of the elements 
used to perform the character recognition. There is a source of light, 
a diffuser, and a lens to concentrate the diffused light ontoan unknown | 
transparent character. If opaque copy is used, this initial optical 
arrangement would be altered but the following discussion will remain 
essentially unaltered. 
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Computer 


Photodetectors 
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FicurE 6.—Schematic diagram of complete teat reader in a form suitable for 
reading photographic transparencies. 


The light passing through, or reflected from, the unknown trans- 
parent character then diverges and passes through the array of stand- 
ard apertures, and thence to a corresponding array of photodetectors. 
The electrical outputs from each of these detectors are ultimately 
processed and fed to the translator or to a recording tape for ultimate 
use with the translator. 

It is noted that the mask or array of apertures includes all of the 
characters of interest inclusive of the one being identified. Further- 
more, if the unknown character is within the master set of apertures, 
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no motions, mechanical or electrical, are necessary to recognize or 
identify the unknown. Consequently, with this method of optical 
automatic print reading, speed is not limited by the optical recog. 
nition technique. Currently the speed is controlled by such factors 
as text alinement, poor copy, and so forth. 

When extended, imaginary lines, or optical rays, drawn through 
the center of the unknown character “A” and passing through all of 
the centers of all of the characters in the array of apertures pass 
through small holes placed in front of each of the photodetectors, 
Figure 7 shows a photograph of the mask and also two phot ographs 
of the distribution of light in the plane of the small holes covering 
each of the photodetectors. If the letter “A” is the unknown, a bright 
spot appears in the plane in front of the detector for recognizing “A,” 


FIGauRe 7 
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I think if we consider this pase, this very last page, the upper 
diagram is typical of the mask, and you notice the letter “A”, is 
three up and three over, on the mask. If you look at the lower left- 
hand picture, which I must say has been reproduced very poorly, there 
properly should be a bright dot in the position corresponding to the 
position of the letter “A.” Actually, in the original negatives which 
we have, this is much more obvious than in this multilith reproduc- 
tion. Similarly, if the dollar sign is used, as the unknown letter, in 
the next diagram one will observe a brighter spot in the second row, 
the second line down, indicating the identification of the dollar sign. 

This method of recognizing alphanumeric characters simultaneously 
correlates all of the characters in a particular type font with the un- 
known. The decisionmaking function is carried out electronically 
by means of threshhold circuits at the outputs of the photodetectors. 

Furthermore, with only a slight increase in the optical complexity 
of the system, minor dissimilarities in characters and punctuation 
marks such as periods, commas, and semicolons are readily recog- 
nized. It is anticipated that appropriate identification of punctuation 
marks will be of primary importance in effective mechanical trans- 
lating. This is believed to be a feature of this approach to optical 
reading which is not inherent in other disclosed techniques. 

The Cuarrman. Your approach is entirely dissimilar, then, to the 
approach made by these agencies that have heretofore testified before 
this subcommittee ; is that right? 

Dr. Driscott. Well, I did not hear the other testimony, Mr. Chair- 
man, so I 

The Cuarman. Does your approach contemplate the use of a 
computer ¢ 

Dr. Driscoti. Well, I think that I would prefer to have Mr. Fitz- 
maurice answer that question in detail. 

The Cnamman. Mr. Fitzmaurice, would you answer that ? 

Mr. Firzmavrice. The technique we are using is considerably dif- 
ferent from that employed by other investigators. One of our most 
important considerations in choosing this particular design was to 
avoid the need for a computer. 

The Cuatrman. Then your approach is to use light and shadows, 
isn’t that right ? 

Mr, Frrzmavrice. In considering the various functions performed 
in recognition we have tried to do as much as possible optically. We 
have gone a lot further in this direction than other people have, and 
the more we have been able to do optically the less complexity in the 
electronic system. 

The Cuamman. When you say optically, do you mean visually? 

Mr. Frrzmavurice. No; I always think in terms of a light source, 
something like a lens in the path, and some photodetectors; that is, 
photomultipliers or other detectors. 

The Cuairman. That is what you mean by optically ? 

Mr. Firzmavrice. Yes. 

The Cuamman. What is the status of your project? Is it ready 
for production ? 

Mr. Frrzmavurice. Right today it isn’t, no. We expect to deliver 
a machine which will be quite suitable for providing an input to a 
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machine translator by the end of September or perhaps the first half 
of October. 

The Cuarrman. So by the end of this year you should have one 
that would be usable? 

Mr. Firzmavurice. Yes. I believe there will be an additional 3 or 
4 months which are required, not for the development of the machine, 
but rather for an examination of the literature which is chosen for 
translation, to make certain that we have obtained photographs of 
all the type fonts which are going to be used. 

So far there has been no decision as to what material will be trans. 
lated initially. 

The Cuatrman. You simply photograph the type fonts that will 
be used, and then you place them in the reader, depending on the 
characters of the item to be translated ? 

Mr. Firzmavrice. That is correct. We store 12 type fonts at a 
time in the machine, which is adequate for any—well, certainly 
adequate for any particular scientific paper, and usually adequate for 
a complete periodical, although there are periodicals which involve 
a larger number of type fonts. 

In this case we are fairly well off, because this wheel of 12 ty 
fonts is simply interchanged by the operator, and we now can put in 
12 different type fonts. They are all on a photographic plate so that 
they are easily changed. 

The CuarrmMan. Your machine will have the characteristics of read- 
ing about a thousand characters per second ? 

r. Frrzmavurice. That is the instantaneous rate of reading. The 
average rate will be dependent upon how much of a particular page 
may be blank, or how much time is lost due to spacing being filled up 
with graphs or photographs which are not being read. 

There is about a 30-percent loss in time between lines ordinarily. 
We scan 11 lines per second. 

The Cuatrman. How can you get 1,000 characters on 11 lines, then? 

Mr. Firzmavrice. They run around 60 characters per line. 

The Cuarrman. That is only a little over 600 characters. 

Mr. Firzmavrice. Sixty milliseconds pass while we are reading a 
line, and there is a 30-millisecond dead time while we are going back, 
so that the average rate comes out to be about 670 characters per 
second. This is a function of the particular type of material being 
read. As long as we are restricting ourselves to Russian literature, 
let’s say it is a general characteristic of the type of text which appears 
in literature; it seems desirable to leave that dead time in there, so 
the average reading rate comes out to be about 670 characters per 
second, although the instantaneous rate, one character to another, is 
about 1,000 characters per second. 

The Cuarrman. How much has your company expended on this 
program? Is it all Government money or is it private money also? 

Mr. Frrzmavrice. There was a feasibility study—I don’t remember 
the date—$49,000, to demonstrate to the Air Force the fact that this 
particular method of recognition would indeed do the job. 

This was followed with a hardware program for $380,695, to build 
the present developmental model which incorporates not only a demon- 
stration of the optical recognition procedure, but a demonstration of 
all of the other components involving alinement of the text, recognl- 
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tion of characters, recognition of the fact that you have not recognized 
a character; that is, recognition of the fact that the character in the 
field of view is not in the type font that you are comparing it with 
at present, so you remember the position on the line and you try 
another type font. 

There were a number of such things which became involved when 
we went from the feasibility study into a development model. We 
immediately went to a higher speed which for various reasons means 
we needed more light. It might be beyond the scope of this hearing 
to explain why more light is connected with the higher speed, but 1 
think this is fairly obvious to any optical engineer who has been 
concerned with the signal-to-noise problems and the fact that the 
band width increases with the reading speed. ” 

This money has all been Air Force money. In addition there has 
been Baird-Atomic and other commercial money applied to this work. 

The Carman, Any questions from the gentlemen of the com- 
mittee ? 

Mr. Bass. Yes, I have. 

The Cuarrman. All right, Mr. Bass. 

Mr. Bass. This machine, as I understand it, this automatic print 
reading machine doesn’t actually translate, we will say Russian into 
Baglin! 

r. Frrzmavurice. That is correct. It recognizes a particular 
character, and then transmits this information to a machine translator. 

Mr. Bass. So that it would be useful in the event that we produced 
an automatic translating machine that needed normally a manual 
input. This would take the place of a manual input ? 

Mr. Frrzmavrice. That is correct. A major factor in the cost of 
machine translation is having the typist prepare this punch-paper 
tape. Using a typist to prepare this input, it would be more expen- 
sive to prepare the input than it would be to perform the translation. 

We expect to be able to beat the cost of a typist by a factor of five, 
initially, as has been stated. That is rather conservative, since 
it assumes a complete writeoff of all the development and production 
in 10,000 hours of use, as well as all the operating costs, with a fairly 
generous allowance for maintenance. 

I expect at a later date, when we can start mechanizing the photog- 
raphy, that we may get an additional factor or two here so that we 
can beat the cost of typing by a factor of 10. 

Mr. Bass. You never envisioned making a machine, adopting this 
principle that would actually do translation ? 

Mr. Firzmavrice. Oh, no. This is to prepare the input, to scan 
the original Russian text, and to transmit this text to the machine 
without significant intervention of human operators. 

Mr. Bass. That is all. 

The CHarrman. Mr. King. 

Mr. King. Where does the word “Atomic” come into the picture? 
[Laughter. ] 

Mr. Barrp. Up until 1956, the company was Baird Associates. We 
merged with Atomic Instrument Co. at that time and changed the 
name, 


We are working quite a bit in the field of atomic measuring 
equipment. 


= 
| 
] 
| | 
| 
| 
’ 
| 
r 
ly 
g ee: 
r 
d 
vf 
. 


118 RESEARCH ON MECHANICAL TRANSLATION 


Mr, Kine. Thank you. That is all. 

The Cuarrman. Mr. Miller. 

Mr. Mituer. No questions. 

The CuarrmMan. Does the print reading technique used by Baird- 
Atomic have any application to other automatic recognition prob- 
lems of military importance? 

Mr. Frrzmavrice. Yes, it does. There is a limit to how much can 
be said in an open hearing, of course. We have been working for the 
past 5 years—not so much on character recognition, as on more gen- 
eral recognition problems; in particular, the problem of automatic 
photo interpretation. 

Initially, the work was classified. We have also done unclassified 
work, though. The real application there is due to the fact that even 
today the ability of reconnaissance vehicles to acquire photographs far 
exceeds the capability of the Air Force to look at them, and it is not 
so far in the distant future that the potential output of unmanned re- 
connaissance vehicles would tax the ability of the Air Force to look 
at the pictures, even if every man in the Air Force were a good photo 
interpreter. This means that in order to really take full advantage 
of your output, you’ve got to do a substantial portion of the photo 
interpretation by machine. An associated problem relates to the in- 
formation available to reconnaissance vehicles—especially an un- 
manned one—which would not have any recovery of film. The in- 
formation available far exceeds the capability of any transmitter to 
return it to the ground. Consequently, it is necessary to do a certain 
amount of processing of information within a vehicle to make full 
use of its potential. 

Baird-Atomic has been very interested in this during the past 5 
years, but I do not think we can say anything more in an open hearing. 

The Cuarrman. When the Baird-Atomic print reader is delivered 
to the Air Force, will it be completely ready doe operational use with- 
out any further development ? 

Mr. Frrzmavrice. The machine itself will be ready. I believe there 
is a period of 3 or 4 months needed to photograph all of the ty 
fonts after a decision has been made as to which literature is to be 
translated initially, and then a continuing effort as new type fonts 
turn up. 

There have been a number of significant studies which relate to the 
selection of fonts. New York University, as may have been men- 
tioned here earlier, has made a detailed study of the statistics of ap- 
pearance of type fonts in the Cyrillic—rather, of the literature which 
uses the Cyrillic alphabet, and have come up with the fact that certain 
type fonts do appear with much greater frequency than others. This 
has been used as a partial guide. 

However, even though one particular type font may dominate 90 
percent of all text, in order to completely read a periodical you must 
be able to read also type fonts which may only appear for 50 charac- 
ters out of 100,000. It is necessary to continue the statistical examina- 
tion of the Russian literature which is selected for translation. I 
think this is a 3- or 4-month job, but even before this is undertaken we 
are ready to, or will be ready in the first part of October to be scan- 
ning a fairly large amount of Russian literature. 
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The CuatrMan. What use of the print reader is recommended for 
the period immediately after the final acceptance test? 

Mr. Frrzmavricr. | think, again, referring to the same 3- or 4- 
month period, it would be desirable not to try to put together a com- 
plete translation complex before the end of the year—before January. 

I would rather see the print reader remain on the Baird-Atomic 

remises while we generate magnetic tape which could then be fed 
into the translator in the interim period, I think Dr. King of IBM 
feels that within a few months after the machine is used together with 
his translator, we can dispense with the magnetic tape, which is a 
buffer system, and go directly from the reader into the computer. 

However, during the first few months of operation, it is inevitable 
that not all these machines are going to be working at the same time. 
There are going to be breakdowns. By having the magnetic tape 
output of one machine and the magnetic input for the next one, we can 
build up a backlog. 

Likewise, if necessary, the machines can be kept separate—perhaps 
even in different cities for a few months. We could generate a mag- 
netic tape output from the reader, the mechanical translator being 
built by IBM. 

The CuairmMan. The next question I was going to ask is: Is it neces- 
sary that they are in the same city ¢ 

Mr. Firzmavrice. No, it isn’t. They can be as far apart as desired. 
It is one of the advantages of having a general so-called buffer system 
between the two units; that is generating magnetic tape. Even the 
choice of magnetic tape was determined by the high reading speed here. 
Various special purpose business-type reading machines have used 
paper tape outputs. This is an easy thing to hook up. It is just that 
there are no paper tape punches that would operate at the high speed 
involved here, so that it would be a little bit wasteful of the capabilities 
of the machine. You would have to slow down the machine to give a 
paper tape output. It would be simpler to generate the magnetic tape 
and use a converter from that to put it on paper tape if for some reason 
anyone wanted it. I don’t recommend it, because the paper tape, of 
course, is not reusable, and represents a very large fraction of the cost 
of translation. 

The CuarrmMan. You sort of answered the next question, too, here, 
which is: What is being done to mechanize the printing of the English 
output of the translator ? 

Mr. Firzmavrice. I don’t think I have answered that. 

The Cuarrman. You approached that, haven’t you ? 

Mr. Firzmavurice. Yes; I suppose I was getting close to it. I sup- 
pose the Air Force witnesses will mention the fact that they have used 
a high-speed printer or have adapted a high-speed printer, such as is 
commonly used with computers, to print the English text. This is 
very satisfactory as an interim measure. 

One of the serious problems of machine translation which doesn’t 
strike the person who isn’t going to use it, is that you really can’t let 
someone else do your reading for you if you are a working scientist. 
For example, 10 years ago maybe there were 30 people in the United 
States who would have appreciated some of the fundamental literature 
coming out which was ultimately of significance in machine print 
reading. Now, I cannot imagine giving this responsibility to one or 
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two people who are going to be scanning Russian literature who could 
ossibly anticipate this need. 

The real thing is that you must get translations into the hands of a 
very large group of people. The possibility of letting a small number 
of people decide what is to be translated is currently a necessity, but 
highly unsatisfactory from the standpoint of the scientist who is going 
to make use of it. 

The plain fact is that I know that I could never feel confident that 
someone else scanning the Russian literature would be able to pick 
out the things that I could make use of. 

The CuarrmMan. Well, one idea may be of great help to you, which 
would be of no help at all to the next scientist. 

Mr. Frrzmavrice. That is correct. This is the real need, the real 
stimulus for this need for translation. Now, right along with that 
comes the fact that if this is to be achieved then the individual scientist 
still has to look at a fairly large amount of literature, even to read the 
English literature is quite a problem. No one does quite as much of 
it as they would like. 

In order to make it usable it is necessary that the output be typo- 
graphically acceptable. It must employ several type fonts. The com- 
position must be comparable at least to what appears in newspapers, 
You cannot simply type out a manuscript as you would on a type- 
writer. I am sure that anyone on this committee can appreciate that 
a typewritten manuscript of all the hearings would be awfully difficult 
to get through with no headings in different type fonts to set off the 
sections. 

The Cuatrman. We find that to be the case. [Laughter.] What 
has been done to mechanize the handling of photographs and other 
graphic material which appears in the original Russian text? 

Mr. Frrzmavrice. The approach being taken by the Air Force and 
the group of contractors currently working with the Air Force is 
fairly long-range. On the long-range problem for a complete solution, 
Syracuse University has a study contract at present. This, according 
to my understanding, has now led to some recommendation—well, I 
believe some requests for bids will be issued soon by the Air Force in 
this general field. 

However, as an interim solution, Baird-Atomic and IBM and Syra- 
cuse University and Rome Air Development Center have sort of 
gotten together and decided just what we are going to do meanwhile. 
We are going to simply opaque the material, these inserts, before they 
are photographed, so the print reader will only read the text. Then 
the inserts will be handled separately. 

Later we hope that the handling of these inserts will be also mech- 
anized. Initially it is at a fair additional cost. 

The Cuarrman. One additional question. What are the plans and 
interest of Baird-Atomic to exploit print reader applications com- 
mercially ? 

Mr. Firzmavrice. I think perhaps Dr. Baird had better answer this 
one. 

The Cuarrman. Is that more of a commercial question rather than 
a technical question, Doctor ? 

Dr. Barrp. We are quite aware of the need for fast reading of com- 
mercial records, such as bills, statements, charge-plates, subscriptions, 
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addresses, records, and so forth. We are studying this problem quite 
carefully at the moment to determine in just what direction we wish 
to go. 

Fhe CuairMANn. It has possibilities, though, doesn’t it? 

Dr. Barrp. It certainly does. 

The CuarrMan. The colonel has a question or two he would like to 
ask. The colonel made a thorough study of this. 

Colonel Ditton. Yes, sir. 

Yesterday the Department of the Army testified that they were 
contemplating a research and development program for a character- 
recognizing machine. During that testimony it was brought out that 
they had reviewed, or that they were familiar with the Baird-Atomic 
approach, and that the Baird-Atomic approach was not satisfactory 
for their needs. Now, would you care to comment on that? 

Mr. Firzmavrice. Well, no representative from the Army has dis- 
cussed their needs with me in a way that made me indicate that their 
needs are any different from the Air Force needs. Perhaps they are, 
but it hasn’t been brought out. 

In the reverse direction, I regret that we haven’t been able to make 
available to the Army as much information about our system as we 
would like. There was a report written on our original feasibility 
study, which wouldn’t say very much about the final developmental 
model, it would only say something about the general method of 
recognition. 

Now, there was a meeting recently of the Committee on Documenta- 
tion of the Intelligence Board at which we presented a good deal of 
information on our system to various members of the intelligence 
community. 

I think the emphasis there perhaps was on the various mechanical, 
optical, and electronic components. It did not go very much into the 
output or the compatibility with requirements of various groups. 

I think it is a little misleading to people who have not actually 
worked on these systems, but compatibility is really not a serious 
problem. We currently generate outputs in the form of two six-bit 
characters for each Russian character, which means 12 positions on a 
tape for each Russian character. This can be fed directly to any 
machine that can handle it, or it can be put on magnetic tape. 

If it is on magnetic tape, then it can be used by any of the large 
computers since we do put it out in magnetic tape form. It can 
converted to punchcards, or paper tape. 

All in all, I am not sure just what the Army need was that we did 
not meet. I would be very interested in discussing it with them to 
find out what their problem was. I think it might not be as difficult 
as they might assume. 

Colonel Ditton. Well, will you bid on the Army’s proposal? 

Mr. Frrzmavurice. We would be delighted to. 

Colonel Ditton. There was one other point that you were making 
which I don’t think was emphasized enough. Will the product of 
your machine be compatible with all the various machine translating 
mechanisms that are now in existence ? 

Mr. Frrzmavrice. We designed with the IBM machine in mind. 
Anyone who is using a large general purpose computer in their trans- 
lation—and I think that covers almost everyone else—can handle the 
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magnetic tape input that we will give them. I don’t know of an 
other special purpose machines. I would have to hear the details of 
them. 

The Cuatrman. Are there any further questions? 

It is a very, very interesting matter that we have been listening 
to, and I want to assure you gentlemen from Baird-Atomic that we 
appreciate—the committee appreciates—your giving us the benefit 
of your discoveries and your inventions in perfecting this equipment. 

Now, if there is no further business of the subcommittee, we will 
now adjourn until 10 o’clock Monday. 

We thank you gentlemen very much. 

(Whereupon, at 12:07 p.m., the subcommittee adjourned, to re. 
convene at 10 a.m., Monday, May 16, 1960.) 
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MONDAY, MAY 16, 1960 


Howse or RepresENTATIVES, 
ComMMITTEE ON SCIENCE AND ASTRONAUTICS, 
SPECIAL INVESTIGATING SUBCOMMITTEE, 
Washington, D.C. 

The subcommittee met at 10 a.m., Hon. Overton Brooks, chairman, 
presiding. 

The Cuarrman. The subcommittee will come to order. 

This morning we have two important witnesses before this hearing, 
Brig. Gen. Donald P. Graul, commander, Rome Air Development 
Center, Griffiss AFB, New York; and Dr. Gilbert King, director of 
experimental systems research, IBM research center. 

I think it would be proper to divide the time between the two wit- 
nesses. We will, if necessary, go until 11 o’clock with General Graul. 
Following that, we will give the time to Dr. King, from the IBM re- 
search center. I think we can cover it very well. 

General Graul, you came down from Rome, N.Y. I had the pleasure 
of serving on a committee that made the recommendations regarding 
the setting up of the Rome Air Development Center 

General Gravt. I can recall that. 

The Cuamman. The weather was bad; it was rough there, but I 
suppose that is not very unusual. 

General Grau. No; that is true even in Washington, I guess. 

The Cuarrman. Yes. 

You have a prepared statement, and we have your record. Every 
member of the committee has a copy of your background and 
experience so that they will know what you stand for in the Air Force. 
If you will proceed with your prepared statement, we will follow that 
with questions. We are certainly glad to have you here, General. 

General Grau, Thank you, sir. 

The Cuarrman. Will you give the names of all of those supporting 
you to the reporter? You have Mr. Robert F. Samson. 

General Graut. Mr. Samson on my left. 

The Cuarrman. You have Mr. George Shiner. He’s on your left, 
too. And Mr. Oskar Reinson on your right. 

All right, the reporter has those, and if you will proceed with your 
statement we will appreciate it. 
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STATEMENT OF BRIG. GEN. DONALD P. GRAUL,” COMMANDER, 
ROME AIR DEVELOPMENT CENTER, AIR RESEARCH AND DEVEI- 
OPMENT COMMAND; ACCOMPANIED BY ROBERT F. SAMSON, 
GEORGE SHINER, AND OSKAR REINSON, ROME AIR DEVELOP. 
MENT CENTER, AIR RESEARCH AND DEVELOPMENT COMMAND 


General Grau. Mr. Chairman, the Committee on Science and As- 
tronautics of the House of Representatives has requested the U.S, 
Air Force to present its program in automatic language translation, 
I consider it a personal privilege to present the Air Force program; 
and I trust that this presentation will provide all the pertinent in- 
formation. 

The Air Force presentation on automatic language translation will 
consider the need, the status, and the future objectives. We are pre- 
pared to elaborate upon the individual details, as well as upon the 
overall completeness of the Air Force research and development pro- 
gram in automatic language translation. 

The Air Force research in automatic language translation is de- 
termined by the basic need to provide a capability for continued as- 
sessment of foreign scientific and technical literature, both for the 
scientific and intelligence community. At the present time the large 
volume of foreign language literature results in a 5- to 8-month la 
which prevents timely utilization of the information. The Air Force 
has formally recognized this basic need. 

An understanding of “Why” the Air Force has undertaken a pro- 

am of automatic language translation research is implied in the 
Gide need discussed previously as well as by the nature of the transla- 
tion requirements of these agencies. The Assistant Chief of Staff, 
Intelligence, requires extensive translation of reports and publications, 
The Aeronautical Chart and Information Center requires translation 
of source documents that are used for the production of charts and 
topographical material. The Aerospace Technical Intelligence 
Center requires detailed translation of scientific literature in special- 
ized fields. The Air Research and Development Command must stay 
abreast of the scientific progress of all foreign nations in all fields and 
use this knowledge in its own research programs. 

Chronologically speaking, the present Air Force program in auto- 
matic language uestation Geant with the search for a large-capacity, 
high-speed device for use in lexical data handling. And, as you know, 
“lexical” means words arranged in an alphabetical order, as in a dic- 
tionary. This search led us to a glass disk storage device, which had 
been invented by Dr. Gilbert King. At this point, it was recognized 
that the glass disk photoscopic memory would be the means to an eco- 


17 Brig. Gen. Donald P. Graul has been commander of Rome Air Development Center, 
Air Research and Development Command, since August 1957. He is a graduate of the 
U.S. Military Academy class of 1929. ‘After the war General Graul served in various 
research and development assignments, including commander of Watson Laboratories, 
Red Bank, N.J., and Deputy Chief of the Electronics Subdivision, Air Materiel Command, 
pi nay ee He Air Force Base, Ohio, and after graduation from the Industrial College 
of the Armed Forces in 1949 he became Chief of the Plectronic Division in the Directorate 
of Research and Development, Headquarters, U.S. Air Force. 

General Graul was born on Feb. 24, 1906, in Lehighton, Pa. He received his masters 
degree from the California Institute of Technology in Meteorology in 1937. General 
Graul has received the following decorations: the Legion of Merit with one oak leaf clus- 
ter, the Bronze Star with one oak leaf, the Croix de Guerre with palm (France) and the 
Croix de Guerre with palm (Belgium). 
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nomical and fast method in automatic language translation. The 
logical result was the formulation of a concrete program for auto- 
matic language translation. I might add that the extensive require- 
ments for language translations have forced the Air Force to spend 
over $2 million a year in order to satisfy these Air Force needs. 

As for the status of the Air Force automatic language translation 

rogram, We are supporting an extensive effort at the International 
_— Machines Corporation Research Center that has resulted in 
an experimental model of a fully automatic dictionary lookup tech- 
nique. It accepts Russian word inputs, searches for the English 
equivalent and produces these English equivalents at the rate of ap- 
proximately 30 words per second. The experimental model has been 
operating since last April and has clearly demonstrated its ability to 
perform its basic language translation function. 

The Air Force has at the same time supported a research and de- 
velopment program on automatic print a which will permit us 
to read Russian literature automatically, and at a rate comparable to 
the speed of the automatic dictionary. The feasibility of the print 
reading technique has already been demonstrated, and we expect that 
a more refined model will be tested by the end of this year. Our 
present research is centered essentially on the use of the automatic 
dictionary as a research and development tool, so that we can auto- 
matically obtain the proper grammatical arrangement of the words 
in the sentences and retain the shades of meaning contained in lan- 
guages. Concerning the print-out of the translation in English, re- 
search is in progress for the development of a technique for automatic 
insertion of equations, graphs, charts, and pictures. 

We believe the Air Force Research and Development in automatic 

language translation is a sound program, and that. the integration of 
all of the above major developments can be successfully accomplished 
and demonstrated by the fall of 1961. 
' An important element in the Air Force program is participation in 
other automatic language translation research efforts. We maintain 
direct cooperation with the National Science Foundation, and with 
the Central Intelligence Agency. The Air Force also keeps abreast 
with automatic language translation developments of the Army, the 
Navy, the National Bureau of Standards, as well as with all other 
research efforts throughout the United States. 

Our future objectives are divided into two parts. The first part is 
the achievement of the automatic language translation complex that 
I previously spoke of as being ready in the fall of 1961. The auto- 
matic language translation complex will be able to translate on a 
continuing basis the entire Russian scientific output. It would, we 
believe, do this on a practical and economical basis. 

In the second part, we plan to work on the improvement of the 
quality of the translation. It is expected that once the feasibility of 
automatic language translation has been established, this capability 
can also be expanded to other languages. In most cases for the 
Western languages, the basic work of the development. will have been 
covered by our experiments with the Russian language. Since the 
intelligence and scientific activities of the Chinese C 
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program for automatic language translation of Chinese into English 
would be of great importance to the United States. 

The Air Force is active in the development of an automatic language 
translation capability primarily because of the Air Force interest in 
the translation output. I would like to stress that the successful de- 
velopment of the photoscopic disk memory has realized a capability 
in automatic language translation which can handle a large amount of 
the translation needs of the Nation. 

I feel that these Air Force developments will be of great. benefit to 
the United States and the Air Force stands ready to offer technical 
assistance in any way possible to any translation facility that may be 
established. 

If you, Mr. Chairman, or any members of the committee are in- 
terested in the development at IBM, we will be glad to have you visit 
the IBM Research Center at Yorktown Heights, N.Y., at your con- 
venience. 

The Cuatrman. Thank you very much, General, for your fine state- 
ment. 

I notice you say Mr. Samson will present further details. 

General Yes, sir. Mr. Samson is Chief of the Data Utiliza- 
tion Branch in the Intelligence Laboratory at Rome Air Development 
Center. 

The CrarrMaAn. Does he have a prepared statement, too? 

General Gravw. Yes, sir; he does. 

The CuairMan, All right, then we will proceed, Mr. Samson, with 
your statement. 

Mr. Samson. In order to present a complete picture of the Air Force 
program in automatic language translation, I will first touch briefly 
on some historical highlights on this research. 

In July of 1949, Dr. Warren Weaver of the Rockefeller Founda- 
tion, advanced the suggestion that it should be possible to translate 
languages automatically by electronic processing techniques. In No- 
vember 1949, Prof. Erwin Reifler of the University of Washington, in 
Seattle, began research on automatic language translation. In 1952, 
his research was supported by grants from the Rockefeller Founda- 
tion. Dr. Reifler’s research team designed and constructed in 1954 a 
pilot model for German-English language translation. This pilot 
model was a simple machine, using electrical relays that operated with 
a limited vocabulary. However, in spite of these limitations, it could 
translate a number of selected sentences. 

In 1953, Dr. Gilbert King, while with International Telemeter 
Corp., formulated a new concept for large-scale storage and data 
processing. At that time, attention was drawn to the possibility of 
using his invention, a photoscopic glass disk memory, for automatic 
language translation. In 1954, the Air Force sponsored its first re- 
search effort in machine translation. This initial effort involved ex- 
periments with an automatic Russian-English dictionary. A mag- 
netic drum and assorted electronic circuitry were used to look up 
meanings of the Russian words. This work was done at the Harvard 
Computation Laboratory by Prof. Anthony Oettinger. 

Early in 1955, the Air Research and Development Command, and 
the Rome Air Development Center recognized automatic language 
translation as an area of major importance to the Air Force. 


= 
a 
& 

: 

| 


RESEARCH ON MECHANICAL TRANSLATION 127 


An in-the-house effort was started and concerned itself with the 
collection of available information on automatic language translation 
research. This initial effort involved a considerable amount of 
library work, as well as visits to universities and research organiza- 
tions for discussions with scientists interested in automatic language 
translation. As a result, a coer plan and technical approach 
was formulated for a complete automatic language translation com- 

lex in which we defined three major areas of research. 

The first area concerned automatic input, or more commonly called 
print reading, which is a technique that scans the foreign language 
words and transforms the alphabetic data into electrical signals, The 
second area involved techniques, and machines for translating the 
foreign language words into English. The third area concerned an 
automatic output, visualized as being capable of inserting equations, 
graphs, charts, and photographs into the output in correspondence 
with the translation of the textual material. The quality of the 
printed English output should be comparable to good publication 

ndards. 

We started work in the second area which we regarded as the one 
requiring the greatest effort in research and development. In —_ 
1956, the Air Force started work with Dr, Gilbert King for researe 
and development on his invention—the photoscopic disk. The photo- 
scopic disk is to serve as the heart of the planned translation complex, 
At the same time research was started with Prof. Erwin Reifler at 
the University of Washington to direct the development of a Russian- 
English dictionary. This dictionary was to be stored on the photo- 
scopic disk. ‘This research has been successful. 

An automatic dictionary has been constructed and has been working 
since last April on a word-for-word translation concept. 

I would like to insert a remark here. We are modest in saying 
“word for word.” We actually translate two- and arpa se- 
quences and long strings of words at a time. 

Our present research and development effort is to extend this capa- 
bility to a sentence-by-sentence and paragraph-by-paragraph trans- 
lation to achieve the proper word order and to improve on the trans- 
lation of the word meanings. Later on, we placed numerous other 
contracts in linguistic research. All of this research is intended to 
contribute to the sentence-by-sentence and paragraph-by-paragraph 
translation. Today the Air Force has 10 research and development 
contracts, 6 with universities and 4 with U.S. industrial concerns. 
Two of the contracts are with European universities. 

At the present time, the input to the automatic dictionary is a 
perforated paper tape of the Russian text, prepared manually by a 
typist who does not need to know Russian. This input is done on 
an electric mel ied having a Russian keyboard. By this method, 
only 40 words a minute can be processed into the translation machine. 


_ To overcome this slow and costly manual preparation of informa- 
tion for automatic processing, the Air Force started development of 
automatic print reading in August 1957. This initial effort on the 
automatic input was for an English print reader which could read, 
one type style, all characters, both uppercase and lowercase. 
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In March 1958, as success of this development became apparent, an 
Air Force contract was awarded to New York University for study- 
ing characteristics of Russian printing styles. On the basis of the 
New York University study, active research and development was 
started on a Russian print reader. The Air Force now has a contract 
with Baird-Atomic, Inc., for the design and development of the 
Russian print reader. 

The feasibility of the machine has been demonstrated, and the 
reader should be ready for experimental testing in September of 1960, 
Integration of the reader into the automatic language translation 
equipment is planned for December 1960. This reader can be ex. 
pected to increase the speed of feeding information into the translation 
machine up to 6,000 words a minute, which is 150 times faster than 
the human operator. As a result, automatic print reading will re. 
duce the cost and the input error into the language translator. 

For the third research area (which is to provide an automatic out- 
put), we are now studying the methods for automatically processin 
nonword occurrences such as scientific equations, graphs, charts, an 
pictures in order to make the output as similar as possible to the 
original text. Many commercial companies have developed standard 
equipment that provides a high-speed output for printed text. There- 
fore, our task 1s to combine this high-speed printing output with 
the original equations, charts, and graphs, so as to maintain the orig- 
inal format. Construction on a inching is scheduled to start this 

ear. 
m In the area of applied language research, efforts are centered on the 
roblem of multiple meanings, and on the techniques for the order- 
ing of words in a sentence in a manner to satisfy the requirements of 
the language. 

Language research in our program to date has been concerned with 
what was needed to supplement the dictionary stored on the photo- 
scopic disks and corresponding language processing equipment to give 
a reasonable translation. The process of evolutionary research in this 
area should continue until the translation output becomes acceptable 
to a large majority of users. 

The Air Force has a close cooperative research arrangement with 
the National Science Foundation, on the basis of a Government cross- 
service order. This National Science Foundation-Air Force Research 
arrangement in the area of automatic language translation dates from 
1956. The National Science Foundation and the Air Force jointly 
support linguistic research at Cambridge University, England. This 
joint program also supports research at Harvard University for an 
elaboration of the predictive syntactic analysis approach. This is an 
extension of the analysis method initiated by the National Bureau of 
Standards. 

With other agencies, the Air Force cooperates in the exchange of 
research data. The Air Force is keeping shiceast of technical develop- 
ments in automatic language translation conducted by the Army, the 
Navy, the Central Intelligence Agency, and the National Bureau of 
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Standards. The Army project, at the University of Texas, concerns 
research for the translation of German military terminology by means 
of mobile field data computing equipment. The Navy project, at 
Wayne State University, is concerned with the Russian-English trans- 
lation in the field of mathematics. With the Central Intelligence 
Agency project at Georgetown University, there has been close ex- 
change of technical data. The National Bureau of Standards project 
is concerned with translation of Russian, by means of their predictive 
analysis technique. 

The Air Force is now represented on the Subcommittee on Mechan- 
ical Translation of the Committee on Documentation of the U.S. In- 
telligence Board and on the Interagency Committee on Machine Trans- 
lation Research. A member of the Air Force serves as the U.S. rep- 
resentative on the Mutual Weapons Development Team, for exchange 
of technical data on automatic language translation research with 
France. 

The total Air Force research and development expenditures in auto- 
matic language translation amount to $4,800,000 up to and including 
fiscal year of 1960. Approximately 70 percent of the funds were spent 
for the development of feasibility models; the remaining 30 percent 
were expended on — linguistic research. 

The automatic language translation program of the Air Force 
promises other additional benefits. For example, automatic transla- 
tion of the spoken language may be achieved much sooner because of 
the techniques that have been developed in automatic language trans- 
lation research. 

The approach of developing the glass photoscopic disk, used as the 
memory for a dictionary, has opened the door to a new field of data 
processing that uses table lookup functions instead of computations. 
This data processing technique is also ideally suited for the language 
problems that are associated with the indexing, abstracting, storage, 
and retrieval of documents. 

To summarize the Air Force development program on automatic 
language translation, our goal is to demonstrate feasibility of a lan- 
guage translation complex in the fall of 1961. 

You will note an artist’s rendition of this complex is shown in 
figure 8. 
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May I also invite your attention to the sample output on page 10 of 

our brochure. 

The CuarrMan. What is that on page 10? 

Mr. Samson. A sample output. 

The Cuarrman. That is what we are looking for. 

Mr. Samson. This sample is a copy of the output as it comes from 
the machine. 

We expect very great improvement here, and by the fall of 1961 the 
complex should produce machine translations at the rate of 100,000 
words per hour. This rate is the equivalent of translating a 300-page 
book in 1 hour. : 

Assuming that the progress continues at the present rate, the quality 
of the output should be acceptable to the majority of users who are 
interested in the scientific content of foreign literature. This con- 
cludes my presentation. I am most grateful for the opportunity to 
be of service to the committee. Thank you. 

Mr. Chairman, I would like to say I have with me today Mr. 
George Shiner and Mr. Oskar Reinson, of the Rome Air Development 
Center. 

The Cuarmman. Do the other representatives attending here with 
you, Mr. Shiner and Mr. Reinson, have any statements. 

Mr. Samson. No, they do not, sir. 

The CuarrmMan. They are backup witnesses ? 

Mr. Samson. Backup witnesses; yes, sir. 

The CHatrMan. I[ wanted to ask a few questions here. 

You told me about the sample, that is what I wanted to see. Has 
this been touched up at all ? 

Mr. Samson. No, it has not, sir. 

The Cuarrman. It is just as it comes from the machine? 

Mr. Samson. Just as it came from the machine. 

The Cuarman. What machine? 

Mr. Samson. This is the [BM—the Air Force installation at the 
IBM laboratory. 

The Cuatrman. This is the Air Force installation at the IBM 
laboratory ? 

Mr. Samson. Yes, sir. This is the experimental model we have 
there now. 

The CHatrman. This isan IBM computer, then? 

Mr. Samson. No, sir; we prefer to call it the automatic language 
translator at this time. 

The Cuarrman. That gives it an Air Force complexion; does it? 

Mr. Samson. Yes, sir. 

The Cuarrman, All right. 

Mr. Samson. If you will notice on the chart, sir, the center portion 
of the U.S. Air Force translation complex, as we call it, deals with 
the glass dictionary in the language processing. Dr. King of IBM 
is going to give a detailed presentation of that when they are on. 

The Cuamman. Now, let me ask you about this sample here. You 
simply fed into the machine an electronic tape? 

Mr. Sampson. No. As it stands now, it is paper tape. 

The Cuatrman. Paper tape. Out comes this; is that right? 

Mr. Samson. That is right. This is prepared on a paper tape prior 
to insertion into the machine. The input is the slowest part of the 
system now. 
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The Cuarrman. Then the machine brings it out like it is here; is it 
on this paper ? 

Mr. Samson. Yes, sir. 

The CuHatrman. Is this an actual sample as taken from the 
machine 

Mr. Samson. That is a photograph, because we had to make up a 
lot of them, but it hasn’t been touched up. 

The Cuamman. This is a photograph ? 

Mr. Samson. Yes, sir. 

The Cuarrman. But this is not the original sample ? 

Mr. Samson. No, sir. 

The Cuarrman. You don’t have the original sample here, do you? 

Mr. Sutner. Yes; we do. 

The Cuarrman. I would like to see the original sample. 

Mr. Samson. Fine. We have one here. 

The Cuatrman. At this rate, this comes out how fast? 

Mr. Samson. Because of the limitation of the output, that is, the 
machine itself, the output is 40 words per minute. The capability of 
the disk itself is 30 words per second, right at this time. I havea 
sample of the disk here, if you would like to see it. This is what we 
are talking about. Maybe it would help the committee if they could 
see it (fig. 9). 


Fieure 9.—Dictionary in the round: Heart of the automatic language translation 
complex is the memory disk or dictionary—a glass word warehouse 10 inches 
in diameter which has as many entries as Webster’s Dictionary. The 550,000 
Russian-English words are stored in concentric tracks of binary code (shown 
enlarged at right). 
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The CuHarrMAN. That is fed in at 40 words a minute, and that is 
about as fast as we can do it? 

Mr. Samson. That is one machine, now, sir. That disk there con- 
tains about 30 million bits of information on it, in the black line. 

The CHAIRMAN. Does this have the information here now ? 

Mr. Samson. Yes, sir; that is the dictionary now. 

The CuHatrMAN. This is it? 

Mr. Samson. That isthe dictionary. 

The CHARMAN. This is usable in the machine now ? 

Mr. Samson. Yes, sir. 

The CuarrmMan. This disk I hold in my hand ? 

Mr. Samson. That is from the machine. 

The CHarrMaAn. Mr. Fulton will be interested in this. This is the 
disk they use in their machine. They are bringing out 40 words per 
minute. We have a sample. We are going to look at the original 
sample. 

How many words does that disk contain now ? 

Mr. Samson. That contains about 55,000 “entries.” 

The CuarrmMan. At the present time ? 

Mr. Samson. Yes, sir. 

The CuarrmMan. You have 55,000 entries on this glass disk ? 

Mr. Samson. Yes, sir. It will hold up to 500,000 words, or 30 
million bits of information. The anulus of that ring is about four- 
tenths of an inch. 

The CHarrMan. The anulus; what is that ? 

Mr. Samson. That is the distance between the outer edge and the 
inner edge, within the black lines. 

You see, this is an evolutionary approach, as you know, and new 
words are added periodically. The dictionary is growing, and this 
isthe way we are proceeding. 

Mr. Fuuron. You are referring to this black circular band? 

Mr. Samson. Yes, sir. 

Mr. Furron. Not to the diameter of the circular, when you say 
anulus ? 

Mr. Samson. No, sir. 

General Graut. Mr. Chairman, I would like to add for Mr. Ful- 
ton’s benefit that the limitation of this complex of equipment is the 
input and the output. When I mentioned the figures, it was the 
normal typing speed of an individual on a Flexowriter who types 
40 words a minute on the tape, and the output is at the same speed on 
a Flexowriter. Those are the two areas that need improvement, and 
pt have made good progress in both the print reader and the output 

evice. 

Mr. Fuiron. How do you get at the memory quickly ? 

La Samson. This is an electronic search—optical electronical, we 
call it. 

The Cratrman. I don’t see any words here at all. [ Laughter. ] 

Mr. Samson. On page 5 of your chart, sir, you will see there is a 
magnified view of the code that is on there. These are black and 
white dots. 

Mr. Fuuron. I was going to ask you who was putting the words on. 

The Crartrman. I have been handed here a sample of a magnified 
portion of the disk. It is a film of a magnified reproduction of what 
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ou have on the disk. That shows you the coding patterns represent- 
ing the words themselves. 

Mr. Samson. Yes, sir. 

Mr. Furron. How quickly could you, with an electronic scope, 
pick words up on this? 

Mr. Samson. Thirty words a second. 

The Cuatrman. That is fast. 

Mr. Samson. Yes, sir. 

The CuarrMan. So feeding that in 40 words a minute the machine 
could pick up 30 words per second 4 

Mr. Samson. Yes, sir. That is why we have the print reader de- 
velopment. As you know, on Friday, Baird-Atomic gave their pres- 
entation on the input print reader. 

The Cuarrman. That is right. 

You say in your statement that there are three matters you are 
working on. 

Mr. Samson. Yes, sir. 

The Cuarrman. You narrated them. The first is actually the print 
reader. 

Mr. Samson. The print reader, the input. 

The Cuatrman. Next is the development of the translating mecha- 
nism. 

Mr. Samson. The heart of the language translator, of course, is the 
glass disk dictionary you have there. 

The Cuarrman. That is the memory of the machine. 

Mr. Samson. That is the dictionary, yes. 

The Cuatrman. Then finally, what is the third? 

Mr. Samson. The output. The composing, the putting back to- 
gether again the graphs and charts and material. 

The CuarrmMan. You were going to show us an actual sample. 

Mr. Samson. These are actual copies we have here, so we can’t call 
them actual samples. We will submit one. 

The Cratrman. Don’t give us one that is doctored up. We would 
like to have an original, with no changes, and no touching up. 

Mr. Samson. I don’t want to scoop IBM today, but I think you will 
have an original before the day is out. 

The CuatrmMan. Can IBM assure us of that fact ? 

Dr. Kine. Yes, sir. 

The Cuarrman. All right, fine. 

Dr. Krna. Yes, sir, we have it. 

Mr. Fuuron. On these proposed new satellites, they are propos- 
ing to have five bands at 400 words a second. Can you duplicate what 
they are trying to do by working out this kind of a process? 

Mr. Samson. No, sir. This equipment here is tailored for language 
processing. I would like to say that as far as the duplicating of the 
communication piece of equipment, this has no similarity to that 
whatsoever, sir. 

Mr. Furton. So it is not in the same field as the work on the com- 
munication satellite ? 

Mr. Samson. No, sir. 

The CratrMan. Well, now, let’s see, we have some questions here. 
I had some marked on your statement. What are these two com- 
mittees you refer to: Subcommittee on Mechanical Translation, and 
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the other Committee on Documentation of the U.S. Intelligence Board. 
What is that? 

Mr. Samson. This is a subcommiuttee formed by the Committee on 
Documentation to keep up with the efforts of the field. 

The CuarmMan. Who is the member of that? 

Mr. Samson. I am the member of that, sir. 

The CuarrMAN. You are the Air Force member? 

Mr. Samson. Yes, sir. 

The Cuarrman. What about other agencies or services? Do they 
have members ¢ 

Mr. Samson. Yes, sir, the State Department, Army, Navy, Na- 
tional Science Foundation, National Security Agency, and CIA. 

The CuarrMan. That is it? 

Mr. Samson. Yes, sir. 

The CuairMAN. Six members? 

Mr. Samson. Yes, sir. 

The CuatrrMan. Who is the chairman ? 

Mr. Samson. Mr. Paul Howerton of the Central Intelligence 
Agency is the chairman. 

The Cuairman. What is the other committee that you refer to? 

Mr. Samson. This is the Interagency Committee on Machine 
Translation Research that was formed by the National Science 
Foundation. 

The CuatrmMan. What is the difference between those two commit- 
tees / 

Mr. Samson. Well, the National Science Foundation committee 
is composed of members of Government agencies who sponsor re- 
search in machine translation. The other committee, the Subcom. 
mittee on Machine Translation, is formed of members that have an 
interest—all Government agencies that have an interest in machine 
translation. The distinction is that one is sponsoring machine trans- 
lation research, and the other is composed of all agencies that are 
interested in it. 

The Cuarrman. Now, who are the members on the second com- 
mittee 

Mr. Samson. That is composed of the Central Intelligence Agency ; 
the National Science Foundation, which chairs the meeting; the Navy, 
the Army, and the Air Force. 

The CuarrmMan. That is about the same. 

Mr. Samson. That is five, I believe. 

The CuHatrMan. Yes, five. The other one has six ? 

Mr. Samson. Yes, sir. 

The CuarrmMan. You leave out one agency, that is all, in the second 
committee. How long have those committees been operating? 

Mr. Samson. The Interagency Committee has been operating since 
the first of the year. The Subcommittee on Machine Translation has 
been operating about a year. 

The Cuatmrman. I could ask you a few questions here. Why did the 
Air Force spend such a large portion of the funds on equipment de- 
velopment ? 

Mr. Samson. Well, as I stated in my formal presentation, the Air 
Force recognized that the need of automatic translations required a 
large capacity, high-speed information for a Russian-English dic- 
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tionary, an automatic reader for Russian input, and an automatic 
print out of English translation. This equipment had to be developed. 
As you notice on the flip chart, 70 percent of the Air Force funds have 
gone into equipment development. 

The Cuarrman. You feel when your equipment is developed that 
you will have the problem licked then? 

Mr. Samson. We will have the equipment problem licked, but we 
feel the linguistic research is an evolutionary approach. We have the 
experimental tool here to continue this, and have an output that is 
satisfactory to the user. 

The Cuairman. Do you call your present output satisfactory for 
practical purposes? 

Mr. Samson. Well, it can be used—let me put it that way. 

The CuHarrman. Yes. You are using it, aren’t you? 

Mr. Samson. No, we are using it in the sense of determining what 
is wrong with our input or our dictionary, and constantly building 
up a new dictionary. Also, we feel that the product right now 
could be used in a very, very limited sense, in that it does show the 
capabilities of the equipment that it prints out in red when there is 
no word to compare in the dictionary. 

This is a flag that indicates you don’t have the word in the dic- 
tionary, or it might be something new—a new technique somebody is 
using. 

“The Cuairman. Your decision is either to put that word in the 
dictionary, or to consider it so relatively unimportant that you just 
simply make a manual translation? 

Mr. Samson. Yes, sir. 

The CHatrman. Do you feel that you are making satisfactory 
progress in this particular field? 

Mr. Samson. Very definitely, sir. 

The Cuarrman. How long will it be before we will actually use 
the product of this machine? 

Mr. Samson. By November of 1961 we are going to have this Air 
Force translation complex complete and integrated. 

The Cuamman. 1961? 

Mr. Samson. 1961 is for what we call the experimental phase of 
it; that is, integrating all the equipment we have together now. This 
is going to prove the feasibility of an automatic language translation 
complex. At that time, we think we will have enough material to 
write specifications for a production model of a translation complex. 
‘ The Cyarmman. Is that sample you gave us doctored up a little 

it? 

Mr. Samson. No, sir. 

The Cuatrman. It is photographed. But don’t you feel that has 
a practical use? 

r. Samson. Yes, sir. There are indications that the output right 
now—not to call it a translation—but the output, or the rendering of 
a translation by a machine, has use right now. It is an evolutionary 
approach. There is a program to continually improve the dictionary, 

e language processing equipment, and the output. 

I think the output will show a greater improvement through the 
years. That is, each month, as we go along, or each 2 or 3 months, 
you will see the output is getting better. 
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The Cuarrman. How much money has the Air Force put into this 
project ¢ 

Mr. Samson. In automatic language translation research and de- 
velopment we put in $4.8 million. 

The CHarrMAN. $4.8 million, over the years ? 

Mr. Samson. Since 1955. 

The CuatrMan. That is the Air Force alone? 

Mr. Samson. Yes, sir. 

The Cuairman. Do you know of any organization that is develop- 
ing special purpose equipment for automatic language translation ? 

Mr. Samson. No; I know of none outside the Air Force. 

The CuarrMan. Should a computer be designed especially for ma- 
chine translation which would eliminate all of the features of stand- 
ard computers not required for this one type of program ¢ 

Mr. Samson. Yes. I feel that the present general purpose com- 
puters are not suited for automatic translation of large masses of lan- 
guage material for economic reasons. However, standard computers 
are suitable for research to develop automatic translations and tech- 
niques, and I think this is essential for the development of the lin- 
guistic analysis. 

The Cuarrman. You think the present computers will do the job 
all right ¢ 

Mr. Samson. No; they will do the job for linguistic research, I be- 
lieve, but not for automatic translation. 

The Cuarrman. You will have to have a special computer for that ; 
is that right 4 

Mr. Samson. A special language-processing machine would be 
preferred. 

The CuHarrMan. Since your program covers automatic print. read- 
ing through the automatic printout of the translation, would it be 
correct to say that this is a system approach ? 

Mr. Samson. Yes, sir. The Air Force research is a systems ap- 
proach. I would like to point out here it is not classed in the terms 
of systems, like an intelligence data handling system and has a num- 
ber on it or anything like that, but we feel the approach we have 
taken here is a systems approach, because it is the only way to achieve 
an early solution to usable translation. Because we have the input, 
the translation, and the output we feel we are using the system 
approach. 

he Cuarrman. Tell me this: What are you doing in the area of 
linguistic research ? 

Mr. Samson. Well, in this area, you will notice that about 30 per- 
cent of our funds has gone into this area, and 80 percent of that 30 
percent is in grammar and meaning research and about 20 percent is 
in dictionary research. We have quite a few of what I call lone- 
range research contracts or approaches, and I would like to mentiou 
just a few of them, and where these studies are going on. 

One is with the University of Milan, in Italy. This is long-range 
research in applied semantics, in the study of the arrangement of 
words for conveying information. The Cambridge Language Re- 
search Unit, of Cambridge, England, is another long-range research 
effort. The Harvard University study is with Dr. Anthony Oecet- 
tinger and is on language research which is an extension of the work 
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at NBS. The University of Indiana, the University of Washington 
IBM, and Thompson Ramo Wooldridge. We include the last two in 
there because of the fact that necessary linguistic research has to go 
on in dictionary compilation. 

Short-range research work is going on at Syracuse University. 

The Cuatrman. How much postediting do you plan on after it hag 
reached its development ? 

Mr. Samson. I would like to say, first off, I agree with Professor 
Dostert. Human translation or machine translation will always 
require some postediting, but I feel this will be done by the reader. 
Because there exists no objective yardstick today for measuring the 
quality of translation output, one cannot say how much postediting 
is actually required in any particular case. I would say, however, 
postediting is done to improve the flow of words for easy reading. I 
feel this is somewhat of a luxury at the present time. 

General Grav. Mr. Chairman, if I might add, I believe that in 
many translations, especially in the scientific area, the scientist would 
want to go to the original document, such as German, and get his 
own meaning out of that. So in this case if the scientist knew 
Russian, he could go to the original and get his own sense out of 
those words. 

The CuHairman. Well, even if he didn’t know Russian fluently, he 
could go to the Russian manuscript and magnify the portion that he 
is doubtful about and run down the semantic situation. 

General Graun. Yes, sir; that is correct. 

The CuHarrMan. On that particular portion, he might translate a 
sentence, or perhaps a paragraph. 

General Gravuu. Yes, sir. 

The CHatrmMan. Maybe that is what he would be interested in. 
You would expect that to continue, regardless, wouldn’t you? 

General GrauL. That is correct, sir. 

The Carman. Could you give us a breakdown of the yearly 
- peel You say the total amount is $4.8 million ? 

Ir. Samson. Yes, sir. 

The following expenditures are broken down by fiscal year begin- 
ning in 1956 and are further broken down into equipment development 
cost and linguistic studies. In the fiscal year of 1956 the total amount 
was $400,000, and $100,000 of this was spent on linguistic studies. In 
the fiscal year of 1957 we spent a total of $700,000; $200,000 of this 
was for linguistic studies. 

In the fiscal year 1958, we spent $800,000; $350,000 of this was for 
studies. 

In fiseal year 1959 we spent $1,500,000 ; $400,000 of which was for 
linguistic work. 

In the fiscal year 1960 we spent $1.4 million; $300,000 of which is 
for linguistic studies. 

That. totals $4.8 million, sir. 

The CHarrman. Yes. What is the amount you need to carry on 
your program ? 

Mr. Samson. I feel that the funding in the past has been adequate, 
and I am sure that the funding in the future will be supplied based 
on past experience, but I would like to say this, that without the 
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support of industry and universities and other Government a encies 
in here for total efforts in the machine translation field, I don’t believe 
it would have been adequate. 

The CuatrMAN. But with that support you think it will be ade- 
quate then ? 

Mr. Samson. Yes. 

The Cuairman. I think you testified that you would not be ready 
for production before December of this year. 

Mr. Samson. No. November of 1961 is when we should have the 
translation complex with all the equipment integrated to show the 
feasibility of doing automatic language translation. : 

The CuarrMan. But you said something about December this 

r. 
a Samson. December this year is the integration of the print 
reader that is now being built by Baird Atomic, Inc., which is going 
to be delivered in September. That is the integration of the equip- 
ment with the English translator which is at IBM. 

The Cuatrman. I am going to call on Colonel Dillon here. Do you 
have any questions ? 

Colonel Ditton. Yes, sir. I would like to ask General Graul, 
commander of the Rome Air Development Center, if you could 
add to the importance of this automatic translation system and relate 
it to the importance of U.S. ability to cope with the cold war effort. 

General Gravy. One principal importance is the timeliness with 
which such a translation complex could get the information to the 
scientist and engineers. As you know, now the period may be any- 
thing from 5 to 8 months with even partial translations. We are 
just doing a fraction of the output of Russian literature by manual 
translation. 

For example, at Rome Air Development Center, I think we have 
three engineers who can read Russian, and this is probably the equiva- 
lent of what we would get in 2 years’ language study. They are not 
experts. If we could give them timely translation of scientific ma- 
terial I think we could get many ideas and know the direction in 
which the Soviets are going in their research, and so improve our 
total program in ARDC in all areas. 

Colonel Ditton. Mr. Samson, what are the pertinent aspects of the 
Air Force program on automatic language translation ? 

Mr. Samson. Well, the pertinent aspect of the translation program 
is the fact that we are equipment oriented in the sense we are spend- 
ing 70 percent of our effort on equipment. We believe this is the only 
way to achieve economical translation. 

olonel Ditton. I was a little concerned over the statement by the 
Department of the Army the other day, saying that they are about 
to start on the automatic input research. I also asked this question 
of Baird-Atomic, when they appeared on Friday. I would like your 
comments on the ability of the Baird-Atomic machine to fulfill most 
of the requirements of the automatic translation research program. 
Would you comment on that? 

Mr. Samson. Yes, sir. Well I feel that the Baird-Atomic machine, 
as developed now, even though it is an experimental machine, has 
the capability to become an input, or its output is actually an input 
for just about any translation machine I ean think of today. The 
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Army’s approach to automatic print reading included some of the 
composing techniques we prefer to keep separate now. 

Also the Army wants to develop a machine that would be used 
for more general use, but I feel that the Baird-Atomic machine has a 
tremendous use for just about any machine translation complex you 
can think of today as an input. 

Colonel Ditton. Have you seen the Army’s requirements for an 
automatic reading machine ? 

Mr. Samson. Last week at the meeting of the Subcommittee on 
Machine Translation, the Army representative read off the specifica- 
tions which were general in nature at that time for that type of com- 
mittee meeting. So I did see the specifications, but in terminology 
that we call specifications, I don’t believe I could call them specifica- 
tions, because they did not elucidate enough so somebody could bid 
on that. 

Colonel Ditton. As a part of this subcommittee have you provided 
the Army with the information on the Baird-Atomic program ¢ 

Mr. Samson. I invited the Army to visit Baird-Atomic. I did not 
go out of my way to provide them with any literature on it, because 
as Baird mentioned, right now there isn’t anything published. But 
that. information is available through normal channels, of course. 

Colonel Ditton. Has the Army requested this information of you? 

Mr. Samson. No, sir. 

Colonel Ditton. That is all, Mr. Chairman. 

The Cuarrman. Mr. Fulton. 

Mr. Fuuron. Referring to your statement on page 4 of the state-, 
ment by General Graul: 

Since the intelligence and scientific activities of the Chinese Communists are 
becoming increasingly more important, the research and development program 
for automatic language translation of Chinese into English would be of great 
importance to the United States. 

The inference from that statement is there is no such project now 
in-being. 

General Grau. That is correct, as far as we know in the machine 
translation area. We do have as part of our program for fiscal year 
1961. funds, I believe it is $90,000, to do research work in Chinese 
translation, and as you well recognize, the problem there of trans- 
ferring their ideographs to some form that can be put in a machine is 
quite a difficult one. But we do plan to start that with fiscal year 
1961 money. 

Mr. Futron. Is that enough money for you to start it, considering 
how difficult the Chinese language is? 

General Grau. Yes, sir, we believe it is. And it is in line with 
other contracts with universities in research similar to that. 

Mr. Futron. In what area of development will that money be 
spent? How will it be divided ? 

General Grau. That will be essentially for research. 

Mr. Furron. Language research ? 

General Grau. Language research, in some technique of trans- 
ferring the ideograph to a form that can be put into the machine. 
That will be the problem. As you know, in transmitting by tele- 
graph, they do have a codebook where a Chinese character has a 
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corresponding number, and they transmit the number. That is their 
way of transmitting the language by telegraph, or teletype. 

Mr. Funron. I am going to ask Colonel Dillon to look into the 

ossible use of these foreign currencies that the United States has, and 
counterpart funds, under both the Public Law 480 program, as well 
as under the mutual security program. Many of these projects you 
have are language projects, and we have people we are keeping in 
idleness as refugees, who certainly can be doing something. In Hong 
Kong, where the conditions are so bad, there are certainly intellectual 
people there that could be used in such a program. 

Is there any plan for using any people such as that? 

General Grau. No plan by us at present. 

Mr. Furron. I agree with you, and I am glad to see that you are 
approaching on a practical basis and using a machine to get input 
and output, rather than going into tremendous theoretical research 
in, I guess you would call them affluences in language. I want to 
compliment you on being practical. I was in Navy Air in World 
War II, and the question is on signals. Why don’t you have a 
separate program on signals? When our fliers are flying in the 
Strategic Air Command, they are really flying blind, as far as Rus- 
sian current signals are concerned. Why don’t you try for some 
sort of a packaged equipment that would take on the job of trans- 
lating Russian signals immediately? I think you will find there 
is a set pattern in them, and not a very broad pattern. The Ameri- 
can flier would immediately get a translation of the ordinary signal 
that the Russian uses, either from. the Rusisan plane or command post, 
and this would assist our pilot in programing his flight. 

Why don’t you have a special program on that to get instantaneous 
communication and translations on signals for airplanes? I can see 
the value of silence, but I wonder why we don’t have just that? It 
would be a basic English translation for automatic transmission on 
an instantaneous basis where it is needed in an emergency. 

For example, you could have a group of Russian planes going to 
gang up on one of our planes, and our pilot is there as a sitting duck. 
He may have the best equipment, but if he is attacked from these 
angles it might be very wise for him to know that basic thing even 
though the articles, participles, and various things aren’t included 
in the message. 

General Grau. Mr. Fulton, I think we are attacking a small part 
of the problem. The part of the research for the coming year is 
how we can put spoken words into the machine. That is actually 
speaking into something, such as a microphone that will immediately 
transfer it into the machine and do a translation from voice. That 
would be one of the steps that would have to be solved in rapidly 
translating speech. 

Mr. Furron. I am always appreciative of science and scientists, 
but I am also always appreciative of the value of simple people. 
I notice by the statement of Robert Samson, that he refers to a his- 
torical date in this program, July 1949, and refers to Dr. Warren 
Weaver of the Rockefeller Foundation. Actually, the particular 
means the general had just spoken of was recommended in the Febru- 
ary 1910 issue of Popular Mechanics, with a diagram, and with a 
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method of dialing along the lines of our modern dial telephone in 
order to get translation of language by telephone. 

So on the basic history of this, I always appreciate the fact that 
you go back to a scientist in 1949, where you are getting your earliest 
antecedents, but actually a lot of this was in Jules Verne and the 
Popular Mechanics approach from 1910 to 1933. So that I feel some- 
thing along that line should be encouraged, as well as emphasis on 
the Chinese translation. 

I looked into that, and I believe out of several hundred scientific 
documents being put out monthly by Chinese Communists, we are 
translating between 20 and 30. There could be a tremendous amount 
and we don’t know of it. 

General Graut. I quite agree. 

Mr. Fuuron. Likewise, on Russia; we have an exchange in our li- 
brary of 85,000 to 90,000 various items each year, nevertheless the 
translation, as you point out, is 6 to 8 months beyond. 

General Graun. Yes, sir. 

Mr. Furron. Secondly, it isn’t really being covered adequately. 

General Grau. That is correct. Mr. Fulton, I would like to add 
one thing that may have been overlooked in the many words we have 
said. In the glass plate you have, with approximately 50,000 words 
on it now, with endings, it is the equivalent probably of 300,000 to 
500,000 words. 

Mr. Samson. These are Russian entries we are talking about? 

General Gravun. Yes, the Russian entries. 

Mr. Suiner. Approximately 300,000. 

General Grauu. The additional point I want to make is we are 
adding to this dictionary store 1 to 4,000 words each month. As the 
research develops new words go into the dictionary, these are added, 
and we add 1 to 4,000 words each month to the photoscopic dictionary. 
We hope in 1961 to have an equivalent capacity on that disc of 
1,500,000 words. These are base words plus endings. This capacity 
of 1,500,000 words is a sizable dictionary. 

Mr. Fuuron. It seems to me it would be appropriate to use these 
foreign funds the United States has, as well as the counterpart funds, 
for this kind of a purpose. Because there are many scholars abroad 
with peculiar language capabilities that are actually on the verge of 
starvation. They cannot fit into a new economy, and they have a 
peculiar capacity that is not being used. 

Last year I had an amendment put on to the Mutual Security Act, 
and it was then revised when Mr. Hubert Humphrey of the Senate 
had a revised amendment when it went over there, so it was a causative 
amendment finally that permits the use of these foreign currencies, 
without appropriations in dollars for the first time. Otherwise, it 
would be charged against your dollar budget. I wish you people 
would look into that amendment, and see what can be done. It would 
give you a tremendous expansion in this program. 

You see, we are doing it with a few people, but we might be able, 
with these millions and millions of dollars of foreign currency just 
lying idle, to have quite an extensive research program on the basics 
of language. When you talk about four people with the ability to 
be bilingual in both English and Russian, that is really just a flyspeck 
on the window compared with what we could have if we used the 
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funds we have just lying idle and nobody paying any attention to it. 
That is all, Mr. Chairman. 

The CHarrMan. I want to say at this time we reserved the last 
hour here to the next witness, and so, General, we want to thank you 
for a very fine statement. Mr. Samson also gave a very fine state- 
ment, and we have gained a great deal of information from what you 
have brought us this morning. 

Mr. Samson. Mr. Chairman, I would like to correct a part of my 
testimony, if I may. 

The CuatrMan. All right. 

Mr. Furron. Now is the time to do it, after he compliments you. 

Mr. Samson. Right, sir. 

In all fairness, in my absence during these preparations of the 
literature here for the committee, a letter was sent to the Army from 
my office on the 26th of April 1960. The letter contained informa- 
tion on the Baird-Atomic print reader. This letter was sent to the 
Army. This is in response to Colonel Dillon’s previous question. 

The CuarrMANn. Do you have the letter? Do you want the letter 
put in the record, too? 

Mr. Samson. I don’t believe so; merely to correct the answer to 
Colonel Dillon’s question. 

Mr. Futton. Let’s put the letter in the record. 

The Cuarrman. You have no objection to putting the letter in the 
record, do you ¢ 

Mr. Samson. No objection. 

The Cuamman. If there is no objection, we will put it in the record. 

Thank you very much, gentlemen, for your coming here. 

(The letter referred to is as follows :) 


HEADQUARTERS, ROME AIR DEVELOPMENT CENTER, 
AIR RESEARCH AND DEVELOPMENT COMMAND, 
U.S. Arr Force, 
GRrIFFISs AIR Force BASE, 
New York, April 26, 1960. 
Subject: Print Reader for Language Translation. 
To: Commander, Army Research Office. Attention: Lt. Col. D. A. Kellogg, 
Arlington Hall Station, Arlington, Va. 


1. In response to inquiry dated March 28, 1960, the following information is 
supplied relative to the Russian print reader being developed by Baird-Atomic, 
Ine., and Rome Air Development Center. 

2. The machine for reading Russian, known as the Converter Group, Print-to- 
Digital AN/GSQ-29, has the capability for reading 12 type fonts, up to 100 
characters per font, of arbitrary choosing. Twelve fonts are considered by us 
to be adequate for handling the Russian styles, special symbols, and possibly 
some Greek letters. By handling this number of fonts, the machine can still 
operate at a speed consistent with the operating requirements of the AN/GSQ-16 
language translator. Because of the ease with which type fonts can be switched, 
there is no reason for determining exact type styles at this time. In this 
connection, the Russian type study by New York University (ARDC Contract 
AF30(602)—1824) is being used as a guideline for the selection of Russian 
fonts. 

3. Because all the technical data is not complete, the handling of equations, 
graphs, charts, and pictures is being treated as a separate problem. The Baird- 
Atomic method for recognizing nontextual material is as follows: The source 
document is quickly preedited by means of an operator placing a narrow, black 
segment of tape on the left margin of the source document, opposite each block 
of graphic material, prior to photographing. This black segment provides a two- 
fold purpose: First, it causes continuous film advance past nontextual occur- 
rences, with reading inhibited; and second, it provides a coordinate signal for a 
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graphics composing machine to follow. In this connection, a study is being per. 
formed by Syracuse University under contract AF30(602)-—2091 to analyze both 
the commercial state of the art and the actual Air Force need for the handling 
of graphics. 

4. Transliteration is being handled as follows: The Russian print reader passes 
all recognizable material to the AN/GSQ-16 language translator. Any words 
which cannot be found in the photoscopic disc dictionary is converted from 
Cyrillic to Arabic alphabet and bypassed to the output, which prints it in req, 
ns gy output copy shows all transliterated words in red, and the translation 

5. In connection with your comment in your paper concerning the differentia- 
ting of Russian characters b and bl, the Baird-Atomic method of reading can 
treat b/ as a real, individual character. 

Rosert SAMSON, 
Chief, Data Utilization Branch, Intelligence Laboratory, 

Mr. Samson. Thank you, Mr. Chairman. 

General, thank you. 

The Cuarrman. We appreciate it very much. 

We will take up next the statement of Dr. Gilbert King, manager of 
Experimental Systems Research of the IBM Research Center. — 

Dr. King, if you will have a seat—and do you have anyone with you 
from the IBM center ? 

Dr. Kina. Yes, sir. 

The Cuatrman. You have Dr. L. R. Micklesen; Richard W. Moss; 
Mr. David F. Loeb; Mr. Stephen H. Beach, all from the research 
center ? 

Dr. Kina. Yes, sir. 

Mr. Fuuron. Do you have their titles? 

The Cuarrman. We have ashort biography here on Dr. King. 

Mr. Futron. The members with him, will he describe what they do? 

The CuarrMan. I will get Dr. King to do that, if you will. 

If you care to give us anything of your background before your 
opening statement, fine, and then indicate what the background of the- 
other witnesses are. 

Well, Dr. King, if you wish to proceed, we will be glad to hear from 


ou. 

Dr. Kina. Thank you, sir. 

I think I might as well be very brief about my background; 
in summarizing that I have done a great deal of work in applying 
automatic computing machinery to various kinds of problems, other: 
than actual numerical processing. Some of these problems are very 
much like those which arise in the question of automatic translation 
of languages. 

My staff have likewise various backgrounds in pioneering work in 
the application of machines to a variety of nonnumerical problems. 

have on my right Mr. Beach, as counsel for the corporation. 

Dr. Micklesen, on my left is our linguist, and has considerable back- 

und in machine translation. 

Mr. David Loeb, who has been concerned chiefly with the uses of the: 
English output. 

And Mr. Richard Moss, who as a former intelligence officer, is ac- 
quainted with the intelligence aspects of this project. 

The Cuarrman. And they are all employees of the International’ 
Business Machines Corp., aren’t they ? 

Dr. Kina. Yes, sir; they are. 
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The Cuatmrman. Fine. If you will proceed with your statement, 
it will be fine. 
Dr. Kine. Thank you, sir. 


STATEMENT OF DR. GILBERT W. KING,’* DIRECTOR OF EXPERI- 
MENTAL SYSTEMS RESEARCH, RESEARCH CENTER, INTER- 
NATIONAL BUSINESS MACHINES CORP.; ACCOMPANIED BY 
RICHARD W. MOSS, DR. L. R. MICKLESEN, STEPHEN H. BEACH, 
AND DAVID F. LOEB 


Dr. Kine. Mr. Chairman and members of the committee, I appre- 
ciate having been asked here to speak for the International Business 
Machines Corp. A very substantial program on the automatic proc- 
essing of languages is currently being carried out at our research cen- 
ter in the area of “information retrieval.” A large portion of the 
program is directed toward the automatic translation of one language 
to another, and about half of this effort is supported by the U.S. Air 
Force through the Intelligence Laboratory of the Rome Air Devel- 
opment Center. We feel that translation is in many ways typical 
and will turn out to be the central problem in the retrieval area. 

It has been our aim in automatic translation to consider all aspects 
of the problem and to make use of the skills and backgrounds of men 
who have pioneered before in the application of machines to this kind 
of nonnumerical material. Thus, we have reached a point where we 
have the main skeleton not only of a theory but of the operational 
requirements and the equipment to carry out useful translations in 
real time. By “real time” I mean the time scale on which such trans- 
lations are actually needed by the Nation. 

The first task, especially for a long-range program, is the establish- 
ment of a Veatiseatisal model” suitable for the application, as is 
done in operations research. We have studied the work on the prob- 
lem being carried out at the Cambridge Language Research Unit in 
England, made visits there, and engaged them with IBM funds as 
consultants. The ideas of this group are very stimulating and sophis- 
ticated, and have led to the proper kind of mathematics to use in our 
model. The work of the University of Milan is not known to us in 
detail, but its general principles are very closely connected to our 
model. To date we have no indication of work of this particular 
nature done by the Russian groups. 


8 Gilbert W. King: Manager, Lexical Processing Research. 

Education and professional affiliations: B.S. in chemistry, Massachusetts Institute of 
Technology, 1933 ; Ph. D. in chemistry, Massachusetts Institute of Technology, 1935. 

Experience: After receiving his doctorate, Dr. King was a national research fellow at 
the California Institute of Technology, Harvard, and Princeton, an instructor at Yale, 
and a research associate at MIT. His interests were in quantum and statistical mechanics 
and information theory applied to infrared spectroscopy. During the war he served as 
operations analyst with the Office of Scientific Research and Development and received 
the Army-Navy Certificate of Appreciation. 

Dr. King’s industrial associations have been with three corporations: Arthur D. Little, 
International Telemeter, and IBM. At International Telemeter, where he was chief engi- 
heer, he was responsible for the development of the AN/GSQ--16 (XW-1) Photostore for 
lexical storage. His present duties at the IBM research center include the direction of 
all programs in automatic language translation and information retrieval. 

Dr. King has been an invited lecturer at Harvard (1947); national chairman for the 
1956 meeting of the Association for Compating Machinery ; and associate editor of the 
Journal of Chemical Physics (1948-50). e has served on the visiting committee for the 
MIT Corp, and on the USAF Beacon Hill intelligence study. He has also been a con- 
sultant to the U.S. Navy at China Lake, Calif., and to the Institute for Defense Analysis. 
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The details of the model are being established by our own work and 
by close study of the work of other groups in this country. By details 


we mean a precise formulation of what steps need to be carried out in 


the translation process, and the identification of the clues which are 
needed to initiate these steps. The clues relate to both grammar and 
meaning. Information that can be assessed and possibly assimilated 
to the IBM formulation is best obtained from the actual programs 
written for machines. These have not been available to us, but the 
general nature of the steps to be taken have been described in reports 


by Georgetown University, Ramo-Wooldridge, the Rand Corp., Har- 
vard, and the University of California, all of which we have and read, 


Some of these efforts are very ambitious in their commitment to 


complete analysis of whole sentences. It is not clear whether the 
programs being produced are of sufficiently sophisticated nature to 
perform such analysis on most sentences in a typical text. However, 
in 1950 Abraham Kaplan of UCLA and the Rand Corp. pointed out 
that 85 percent of the ambiguities in Russian could be solved by clues 
from adjacent words. We have incorporated this observation in our 
model, and have developed a method to translate phrase by phrase, 
Naturally, our theoretical group is working very hard in elaborating 
the details of the model to extend the range of the clues from the 
phrase to the whole sentence. 

Apart from these academic studies on languages which IBM is 
making with its own research funds, we have another effort examining 
the operational requirements which will have to be met if any real-life 
translations are to be made. 

Even though no one yet knows the full details of what needs to be 
done to get good translations from a machine, certain requirements 
have become quite apparent from time to time. The major one is for 
a very large memory to store a dictionary, with reasonably fast access 
to its entries. A suitable memory the AN/GSQ-16 Photostore, 
which was built with the support of Rome Air Development Center, 
is now fully operational Ane is indeed performing its function as a 
dictionary. In addition, the method designed for looking words up is 
such that the present equipment is capable of far more than a word- 
for-word rendering. It has, in fact, proved possible to gain a great 
part of the advantage which Kaplan predicted by exploiting the local 
context of the word being looked up. Further exploitation is quite 
apparent to us and will be introduced shortly. 

It is worth pointing out that problems in both grammar and meaning 
are resolved on the local basis with the addressing scheme of this 
memory. 

An important design specification of the memory is speed, and this 
is determined by the ultimate requirements of a translating system. 
Our early investigations showed that the Soviets alone were printing 
100 words per second, which I feel at least 1 person in this country 
should read, and the memory was designed to have this rate of access. 

Recently a more comprehensive survey of the translation require- 
ments of this country was made by an essentially independent group, 
the Planning Research Corp., under contract to IBM. Results sup- 
ported the 100-word-per-second requirement. for Russian. In addi- 
tion, requirements were established for other languages. 
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I have not made a part of a, written testimony, a summary of the 
preliminary survey made by the Planning Research Corp. Instead 
of reading it aloud I would like to submit copies for inclusion in the 
record. 
The CuatrmMan. If there is no a Sa we will put it in the record. 
(The material referred to is as follows :) 


SUMMARY OF THE PRELIMINARY SURVEY OF THE NEED FOR LANGUAGE TRANSLATION 
IN THE U.S. GOVERNMENT 


Estimates of the accelerating publication of foreign literature and increasing ac- 
cessions of those publications indicate that U.S. Government requirements for 
translations will become steadily more acute, and that present human-translation 
capabilities will become increasingly inadequate in the face of the demand. The 
world production of books, for example, was shown to be well over a quarter of a 
million titles already, and that almost 100,000 of these were published in the So- 
viet bloc. By 1970, the U.S.S.R. alone will produce 100,000 books, while bloc 
production may well reach 160,000 titles of 12 billion words. The relationship be- 
tween foreign-language publication, U.S. accessions, intelligence community re- 
quirements, and current translation efforts for the U.S.S.R. is summarized in 
the table below. It should be noted that although only about 5 percent of U.S.S.R. 
publications are accessioned by the U.S. Government, by 1970 the present rate of 
procurement will add about 2.5 billion words from Soviet books and periodicals 
to U.S. collections each year. 

Under the constraints of current human-translation capabilities and costs, U.S. 
intelligence agency requirements are kept to an extremely modest level. Require- 
ments total only six-tenths of 1 percent of Russian publications and 13 percent 
of current accessions. Current translation efforts are able to fulfill even less of 
the demand : 15 percent of the intelligence requirements, 2 percent of current ac- 
cessions, and only nine one hundredths of 1 percent of Soviet production. Even 
if a more moderate criterion—the need to translate only critical intelligence in- 
formation—were adopted, current accessions from the Soviet bloc and Communist 
China alone would require the translation of about 250 million words per month. 
Such a requirement would necessitate an expansion of the pool of private and 
governmental translators to over 50 times its present size, and when the trans- 
lation of intelligence materials from areas outside the Sino-Soviet bloc are added 
to the burden, the cost in manpower and funds becomes prohibitive. 

Finally, it should be noted that the potential for machine translation of foreign 
languages developed in this study takes into consideration only requirements 
arising from intelligence activities in the U.S. Government. Translation require- 
ments associated with other governmental activities and nongovernmental de- 
mands would expand the potential far beyond the present estimate. 


Comparative volumes of Russian publications 


Books and Periodicals | Total (mil- Proportional 
monographs (million lion words) relationship 
words) (percent) 
A. Production, 1958-59___ 5, 175 22. 500 27, 675 
B. Production, 1970___- 4 bad 7, 500 56, 000 63, 500 
C, Aecessions, 1958-59 923 363 1, 286 | C/A=5. 
D. Accessions, 1970 Be 1, 689 818 2,507 |} D/B=4. 
E. Requirements, 1960___ 62 108 170 =0.6. 
t/C=13. 
F. Translation, 1960_.............- 13 13 | 2 | F/E=15. 
F/C=2. 
F/A=0.09. 


Dr. Kine. I would like to point out that in the table the most perti- 
nent figures are the estimated 27 billion words put out by the Soviet 
bloc, of which less than six-tenths of a percent are considered to be 
requirements by the intelligence community, and a much smaller per- 
cent are actually translated. The system we are designing and have 
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partially built is aimed at translating about 10 percent of the total 
output production. 

he equipment we have built lends itself very readily to immediate 
switchover from one language to another, so that languages other thay 
Russian can be handled mh: te 

Apart from the method of translation, there are many other very 
important functions in translation, which only the IBM group hag 
examined and solved to a large degree. First 7 is the actual labor 
and detail of loading the memory with a substantial dictionary, not 
only for tests, but to estimate the cost of a full-fledged system. IBM 
has expended considerable funds through its own personnel and 
through subcontracts with Syracuse University in building up a word 
list from the original work done by the University of Washington 
under another RADC contract. This is necessary so the method and 
system can be evaluated, without having to be distracted by lack of 
words in the dictionary. 

We found many years ago that it is very dangerous to restrict one 
studies to a single field or discipline, and have felt it essential through 
out the research and development phase of the work that the wholg 
language be covered. It is relatively easy to resolve ambiguities if 
the field is restricted, but the programs to do this fail when the wordg 
occur in different. contexts. No writer limits all his text to a single 
field; astronomers, for example, write about “red giants” and “whitg 
dwarfs,” which happens to be a case where human translators had§ 
lot of trouble trying to find what the technical equivalent of those werg, 

Only recently has our dictionary been adequate. It now has 55,000 
stems. With all the endings also listed, this corresponds to perha 
half a million words as they appear in text. We believe that the digs 
tionary must ultimately contain 400,000 stems. 

There are other necessary features in translating, such as punctum 
tion, numbers, letters of other languages, et cetera, which convey it 
formation. The handling of these in the processing of text has been 
solved by our machine organization. To facilitate reading, attention 
must be paid to format—capitalization, paragraphing, hyphenatiogy 
and the like—provision for which has also been made. 

There are two other very serious equipment problems facing us evel 
in the current stage of research and development, namely, input and 
output. Currently we have to prepare the input text on punched 
paper tape by typists using Flexowriters. I might say we have found 
it possible to take girls in our laboratory, and in 3 hours teach them thé 
Cyrillic alphabet, and have them type from the orginal text on a Flex 
owriter at the normal typing speed with practically no errors, so thal 
we do have a practical input to our machine. At the present tims 
we have two such girls typing essentially all day preparing tape f6F 
our machine. As was indicated in the last testimony, our machiig 
can handle tape a lot faster than it can be prepared at the present timaag 

Output is typed in upper and lower case letters, with punctuate 
and complete format control, on a Flexowriter driven by the machin 
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Tosapumu Jenytath! 

Bce BHICTyNaBlme Ha CeCCHM cormacHe 


AOTOB C APYTHX MePONPHATHAX, Hampas- 


J@HHBIX Ha NOBLIMNeHHe COBeTCKOrO Hapoya, 
o sasepmenuu 1960 roxy nepesoxa scex pabouux 
Ha JeHb. B cBoux BbICTYMeHMAX JenyTa- 
Coserckoro 


B esoux rosopunu o no- 
TBOPDYECKOM Hallero Hapoda, CAABHbIX 
TPYAQBLIX ROAXOSHHKOB, Copet- 
CKoH cTpPaHbl. 

CemuaeTky MbI xopomo, topapamu. 
HHA B HapoJHOM NOBbIWaeTCA 
Aa. Beaukas 8008 Hapoda, ero colva- 
AUCTHYeCKOe COPEBHOBaHHe MHAMOHOB — BepHbIit ycneul- 
MOXbeMa SKOHOMUKM, Hayku KYAbTYDHI, 
NOBLIMeHHA OOBETCKHX 


opopa oO ycmexax, MbI BCerfa CMOT- 
peTb Ha BCe CTOPOHEI Hallie JeATeMbHOCTH, He YCNOKaMBaTbCA 
Ha JOCTHFHYTOM, UOcTOAEHO Sa00THTLCA 0 TOM, 4TOOLI 
ACHOAb3Z0BaTb y Hac PesepBbI BO3- 
ANA MOMHOTO pasBUTHA BCeX OTPacteH HapoOAHOrO X0- 
3AHNCTBA. 


Ha zacedanuu Coeema CCCP 7 maa 1960 100a 


on session § 


Comrade deputy | 
All appearing on} 
advanced in reports, and unanimous 


Soviet people, and about completion 
on abbreviated worker day. Ino 
foreign policy Soviet government. 


- In own appearand| 


creative rise our people, about 


farmers, intelligentsia Soviet cou 


Seven-year schod 
overfulfill industrial plan, grow ; 
labor. Great will people, it/its , 
correct guarantee successful accom 
further rise economics, developme 
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side our activity, not calm on reg 


use having by us great reserves all 
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Pravda May 8, 1960 A conventional, human translation of the first four paragraphs of 


N. S, Khrushchev's May 7 speech, the whole of which was rendered 
into unconventional English by the IBM automatic translation system 


Concluding Word Comrade and submitted to the Foreign Language Translation Hearings Conducted 
before the House Committee on Science and Astronautics May 16, 1960. 
N. S. Khrushcheva 


on session Supreme Soviet USSR 7 May 1960 year 


The Concluding Remarks of Comrade N. S. Khrushchev at the Session ; 


| of the Supreme Soviet of the USSR on 7 May 1960 
10 Comrade deputy | 


All appearing on session expressed full cansent with/from positions, 


advanced in reports, and unanimous supported offer Soviet government about cancellation Comrade Deputies! 


taxes with/ All those who spoke at the session expressed complete agreement 


Soviet people, and about completion in 1960 year translation all worker and employee with the positions advanced in the reports, and unanimously supported the 
proposals of the Soviet government for abolition of taxes collected from 


on abbreviated worker day. In own appearances deputy unanimous approved inside and industrial, office and professional workers and for other measures directed 


foreign policy Soviet government at improving the welfare of the Soviet people, and for the completion in 1960 
of the transition of all industrial, office and professional workers to a shorter 
In own appearances deputy talked about great political and working day. In their speeches the deputies unanimously approved the internal 


and foreign policy of the Soviet government. 
creative rise our people, about glorious labor matters worker, collective 

farmers, intelligentsia Soviet country. In their speeches the deputies spoke about the great political and 
creative development of our people, about the glorious feats of labor of 


Seven-year school/plan we began good, comrade. Are carried out and the workers, collective farmers, intelligentsia of the Soviet country. 


= overfulfill industrial plan, grow storage in national economy, increase productivity 
We began the seven-year plan well, comrades. Production plans 


labor. Great will people, it/its selfless labor, socialist competition millions - are being fulfilled and overfulfilled, reserves in the national economy are : 
growing, the productivity of labor is increasing. The great will of the people, fe 


acccaplishment end overfulfiiment eeven-yesr plan, their selfless labor, the socialist competition of millions are a reliable guar- 


further rise economics, development science and culture, increase welfare Soviet antee of the successful accomplishment and overfulfillment of the seven-year 
plan, of the further advance of the economy, of the development of science 
people. and culture, of the improvement in the welfare of the Soviet people. 


Talking about successes, we always should critical look at all 


While speaking about successes, we should always critically examine 
side our activity, not calm on reached, constant care about that in order to completely all aspects of our activity, we should not rest content with what has been 
| achieved, we should constantly be concerned about complete utilization of 
the great reserves we have and of the possibilities for the powerful develop- 
national economy. ment of all branches of the national economy. 


use having by us great reserves and possibility for high-power development all branch 
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I would like, if I may, to enter an exhibit in the record, which I 
believe you have. It is an actual facsimile print-out of the text from 
Pravda, May 8, on the U-2 incident—at least Khrushchev’s speech. 
The Cramman. Wait a minute, where is that ? 

Mr. Bracu. Colonel Dillon has some copies of that. 

Colonel Ditton. Yes, sir. 

Dr. Kine. You have, I believe, all the articles. 

The Cuarrman. All right, sir, we have it. 

Dr. Kina. We are only presenting this as an indication that our 

uipment really works. You can put text in and you get output out 
of it. We feel very strongly this is by no means the end. We have 
very definite ideas of what should be done and what can be done to 
improve this output very considerably in the very near future. 

(The exhibit referred to is the fold-in. ) 

Colonel Ditton. How much of that do you wish to have in the 
record 

Dr. Kine. As much as you wish, sir. 

The Cuarrman. I think part of it ought to go in the record there. 
I notice the machine didn’t recognize “Lockheed.” That is on page 
3 in the text. It couldn’t get the meaning of Lockheed. 

Dr. Kina. Well, the reason for that, sir, is when the Russians talk 
about our companies or our people, they transliterate it into their own 
Cyrillic symbols, their own alphabet. They don’t translate it. When 
we come to these words, they are naturally proper names not in our 
dictionary, so instead of holding up the machine, the machine trans- 
literates them back into the Roman alphabet. So there is a double 
transliteration, phonetically from English to Russien and Russian 
to English. 

The Cuatrman. Adana, the airport over in Turkey, is also trans- 
literated. 

Dr. Kine. I think you will find many examples of proper names 
transliterated. 

The Flexowriter has a feature which we made use of, when we come 
to these words not in the dictionary and transliterate them, they are 

rinted in red, which helps us in increasing our dictionary, and also 
= some intelligence interest in that you can scan the pages of our 
output and pick out proper names or new terms as they appear in the 
literature. 

As a first step toward improving the input we are getting away 
from prevalent one-key-at-a-time punching methods. This program 
is well advanced. In addition, we are pursuing our own research in 
automatic print reading as well as supporting a development pro- 
gram at the Baird-Atomic Corp. 

With regard to output, a recently completed lexical buffer memory 
makes it possible to record the English translation on magnetic ta 
which can then be used to operate a mechanical printer. In addi- 
tion, studies are being made toward the development of high-speed 
electronic printers with many fonts and enough type characters to 
reproduce the symbols in mathematics and other scientific texts. 


+ 
| 
* 
4 
| 
5 
- 
a 
a 
: 
3 
: 
2 
j 
14 
i 
4 
d 


150 RESEARCH ON MECHANICAL TRANSLATION 


The fiscal year beginning July 1958 was the first in which IBM 
received support from the Air Force on translation work. This 
amounted to $567,804. In that same period IBM spent $381,305 
on the program. In the current fiscal year, 1959-60, the Air Force 
is spending $1,767,469 and IBM $961,069. IBM has budgeted an- 
other $657,950 for the first half of the fiscal year beginning in July 
1960. Thus far we have spent $396,000 on outside consultants, in- 
cluding the Baird-Atomic Corp. Apart from the Baird-Atomic sup- 
port, none of the funds which IBM has spent is in the independent 
pursuit of the print-reading and printing problems are included in 
the foregoing figures. 

The CHarrMan. So your company has spent considerable money 
on this particular development ? 

Dr. Kine. Yes, sir. 

The Cuarrman. You would, of course, expect to gain that back when 
you perfect a system that is usable ? 

Dr. Kine. I hope so, sir; and on the other hand, we have a legitimate 
interest in doing this kind of research, because as I mentioned earlier, 
it is sort of a prototype problem in information retrieval which is of 
considerable interest in the future of IBM machines. 

The CuarrMan. But it puts you in the vanguard of the develop- 
ment of this character. It is also an additional development of the 
computer, isn’t it—the IBM machine? 

Dr. Kine. Yes, sir, it is—in general machine technology. 

The CuatrMan. So it is along the line you have already been pur- 
suing for years in the development of electronic computers? 

Dr. Kina. Yes. 

And in research, our duty is to improve the performance of the 
company’s machines. 

The CHamaan. What do you think of the standard computer, 
such as you build at IBM, and how can they be used for language 
translations 

Dr. Kine. Naturally, I have one of these machines, the 704, in my 
laboratory, and we use this primarily as backup in preparing the 
dictionary, but we also use it for experimental work on the linguistic 
side of the problem. We find it has only limited use, because machines 
of this type are not really suitable as they are. They are really com- 
puters, and there really isn’t anything to compute in the language 
problem. 

The CuHarrmMan. You ultimately may go to an independent machine, 
rather than use a computer? 

Dr. Krna. Yes, sir. Now, the equipment we use will not neces- 
sarily be completely independent, but will represent a new machine 
organization, as was more or less indicated by the Air Force testimony. 
Parts of the equipment are pieces of our latest type of machine; 
namely, the Stretch machine. 

The Cuarrman. What is the status of this project at IBM? 

Dr. Kine. I would say that most of the elements of a complete 
complex have been built and we are getting some operational ex- 
perience with them. There is a great deal, however, to be done in 
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increasing the sophistication of the method of translation. We are 
presently surprised with the res onse we get with our present out- 
put, but we certainly do not feel this isa translation. This is just our 
working material with which to build our further theory. 

The Cuatrman. When do you think the output of these machines 
is going to be usable by scientists and engineers ? 

Dr. Kina. As I said, I was pleasantly surprised by the reception 
our current- output has had, and we actually have scientists in our 
own laboratories not interested in mechanical translation at all beat- 
ing on our doors to put some of their material through the machine. 
For example, we have a group in industrial process control who indi- 
cated to me they would rather read the Russian literature than the 
American literature, because the Russians are very advanced in this 
area. 

Just before leaving I asked—I didn’t ask, myself, because I thought 
that might bias the situation—but had someone else ask how Dr. J. E. 
Bertram felt about the output we had been giving him, and I wonder 
if I may read this. Dr. Bertram is the manager of the automatic 
control theory group at IBM. 

The Cuatrman. I wish you would. 

Dr. Kine (reading) : 

Dr. King’s machine has been useful for giving one an insight into Pontryagin’s 
work. Many of the paragraphs were well translated, and with a little work, 
although I don’t speak Russian, I could get the whole picture. I seldom read 
an entire technical article, but examine the equations first, then try to get illu- 
mination from accompanying text. In some instances I was amused by the 
fact that the machine acted like a truth machine converting a polite circumlocu- 
tion in Russian to.blunt English. Incidentally, mathematics seems to go well 
on the machine—probably because mathematicians use rathe; stilted language. 

I think I can remember one. At the end of a chapter it said “Stop 
now, no further results.” (Laughter.] 

No; it was, “Stop now, short on results.” The human free transla- 
tion was “We are interrupting briefly our results, and going on to 
something else.” 

The CuatrmMan. You say you can switch from Russian to, say Ger- 
man. What sort of job is it to switch a machine from one language to 
another ? 

Dr. Kina. You have seen a disk on which the dictionary and the 
processing material is recorded. These disks are rotating on the 
shaft on a small piece of equipment which is the memory, and these 
particular units i not cost very much money, so that one can have 
several of them. One could rapidly change the disk on a single one. 
This means by a flick of a switch one could have the search reading 
either one disk or another. So as soon as you get the punched paper 
tape in the machine, we can switch over. This compares with the use 
of an ordinary computer, where the input text has to be punched on 
cards. All these cards are sorted and the programs have to be 
switched on magnetic tapes so that it takes quite a while to get the 
_— filled. If you want to switch over, it is a 2- or 3-hour oper- 
ation. 
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The CuarrMan. Some of the witnesses before you have testified that 
translation would be useless unless it is entirely accurate. Would you 
say that that is the case, or that you might use translations that are 
not 100 percent perfect ? 

Dr. Kine. Certainly, speaking as a scientist, I, myself, could read 
output perhaps that ought to be somewhat better than we are getting 
now, but I do not feel we have to have it 100 percent accurate. For 
example, I read the New York Times and get a good deal of informa- 
tion. I even read the previous testimony of this committee, which is 
not grammatically perfect, but I don’t get less informed than the 
usual scientist is. 

The Cuairman. Do you include speech input in your program now? 

Dr. Kine. Yes. We havea project at IBM which we call the Steno- 
writer project, which is going along in parallel, in which the input 
to the Photostore is the output of Stenograph that the Recorder is now 
using. The Stenograph is wired directly so that speech can be put 
into the translating equipment at the rate at which it is spoken and 
translated as fast as the written material can be put in. 

The Cuarrman. So then you would be able to get Mr. Khrushchev’s 
speech, for instance, when it is delivered, just as it is delivered ? 

Dr. Kine. Yes, sir. 

The Cuarrman. And you would not have to depend on the sense 
of the material coming out later? 

Dr. Kine. That is right. One could translate from a radio broad- 
cast or a telephone conversation. 

The Cuatrman. What is the relationship between the amount of 
material being published by the Soviets, the portion you believe should 
be translated by the machine and the portion that is actually being 
translated ? 

Dr. Kine. As I said in my statement, we have had an independent 
group looking into this. The results at the moment are preliminary, 
the final report has not been rendered, And I think I briefly gave the 
figures there, that of the 30 billion words that the Soviet bloc is 
putting out. We feel something like 10 percent should be translated 
automatically in this country. 

We also feel that this translation should be not limited to scientific 
and technical information. I am sure there is a great deal of informa- 
tion for our engineers and for our intelligence community from other 
types of publications equivalent to our house organs and _ trade 
journals. For example, the U-2 was described in a model airplane 
magazine, and also in a Japanese trade journal many years ago. 
There is always the famous case of the sputnik transmission numbers 
whose frequencies were available in what is equivalent to our Avia- 
tion Week, which is generally not considered a scientific publication. 

I believe you said, yourself, it might be profitable to have Pravda 
translated, so that various people in this country could have it on their 
desk in the morning just to review the situation, 

The Cuatrman. [ would think so. 
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Mr. Fulton, do you have some questions? 

Mr. Furron. On the statement, the summary of the preliminary 
survey of the need for language translation in the U.S. Government, 
on line 6, it seems it should read “The world production of books, for 
example, was shown to be well over a quarter of a million titles ‘an- 
nually’ ”’—not “already”—the word is “already.” 

Dr. Kine. I am not too sure about that, sir. May I just see the 
original text ? 

Mr. Fuuron. I will say this to you, we have 114 million volumes of 
scientific material monographs, and in serials, in the Library of Con- 

ss alone, 

Dr. Kine. This is referring only to books, not serials. I am afraid 
that is what it says in our report, but I will ask our contractors for 
a clarification. 

Mr. Furron. I am sure it is more than a quarter of a million. Are 

ou speaking of books, or are you speaking of the various groups the 
ks fall in? If you are speaking of books, I think the volume is 
wrong. For example, we get on the monthly list of the index of 
Russian accessions in the Library of Congress, 8,900 a year, and about 
5,000 of those, or 58 percent, are in science and technology, but that 
is dealing with titles. 

The question is, What do you mean by this? 

Dr. Kine. By “books” we mean a nonperiodical publication of 49 
or more pages. 

Mr. Futon. Well, I think you better look at it, because it isn’t clear 
tome what you mean. I think you probably mean the divisions into 
which books fall, and that would be the monthly index titles of 
Russian accessions. This isn’t clear to me. 

Dr. Kine. I can now clarify this. By “already” Planning Research 
Corp. meant in the year 1959. 

Mr. Fuuron. Then the question comes as to how practical this will 
be. Are we going into so many semantics and language variations 
that it is becoming less and less productive, or are we pretty much 
aiming these programs for practical use? If you become widely 
spread and disseminated the returns are not worth the effort or the 
money. Are we getting into that ? 

Dr. Krna. I do not think we are, in our program, because years ago 
we were looking at word-for-word translation. In the last few years 
we have extended it to phrase for phrase, and are trying to extend it to 
sentences. If one gets a complicated sentence, we may not be able to 
translate it, but we may not need to get that much perfection, so that 
by working from the inside out we can cut the iveliiidieitt off at any 
time it seems practical, whereas if one tries to say you are going to 
make a perfect analysis of the sentence, you are committed to the 


perfect solution from the beginning and may therefore never approach 
it. 


Mr. Fuuron. You have on page 2 of your statement: 


Some of these efforts are very ambitious in their commitment to seek analysis 
of full sentences. 
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The question is: Are we trying to be too ambitious and too sophisti- 
cated and too involved today, so that we aren’t coming up with a 
»ractical method? Are we going to be doing so much research and is 
it so far afield that it is going to interfere with what you people in the 
IBM, for example, would consider as practical / 

Dr. Kine. I feel that that type of research certainly ought to be 
continued at its present level, because it leads us to greater under- 
standing of languages, not only for mechanical translation, but for the 
general problem of conveying information, which after all is our real 
objective. 

Mr. Furron. Of the various agencies that are doing the research, 
you will be working, of course, with the practical end of that research, 
as much as any group. Can you explain to us, or put in the record, 
what the varieties of disagreements are as to the approach, so that this 
committee can see what might be the best approach from a practical 
point of view ¢ 

Dr. Kine. I do not think anything one would call a disagreement in 
approach. As in any new scientific undertaking, there are various 
ways of starting the problem and each group exploits various aspects. 
One of the things we are trying to do with our theoretical group is to 
really understand these different approaches and trying to incorporate 
them all into one thing we call a mathematical model, so that we can 
say, “Well, here is the general theory; this part of it has been thor- 
oughly done by such-and-such a university and this part by another.” 
But 1 don’t think these are conflicting efforts. In the same way 
nuclear physics people do all sorts of experiments with the Atomic 
Energy Commission, and the Atomic Energy Commission takes them 
all. 

Mr. Furtron. When we look at the total amount of literature that is 
being bought by this country from Russia, received in exchange, plus 
the total that would be available on exchange from eastern European 
countries, there is a tremendous amount and it becomes almost neces- 
sary to have some sort of mechanical translation, doesn’t it ? 

r. Kine. Yes, sir; it does. 

Mr. Fuuron. I can give you the figures on the purchases by the Con- 
ressional Library. It is 57,000 pieces a year, bought through about 
50 bookdealers. Then, in addition, we have the 31,000 pieces we 

get on exchange from Russia, making about 88,000 pieces that we are 
getting on accessions for our Library of Congress alone in 1 year. 
And then in addition we have 39,000 additional items from Eastern 
Europe. When you just add up the volume for translating you get a 
tremendous requirement, so that you feel this is really a basic necessity 
if the United States is to keep its predominance in science and keep its 
predominance in scientific accessions in the Library of Congress. We 
also want to insure that we aren’t duplicating something that has 
already been researched and accomplished in other areas and fields in 
the world. Don’t you think that is necessary ? 

Dr. Kine. Sir, if eouli say many of us in TBM Research read all the 
Russian translations we can get hold of for ideas and to see the gen- 
eral trend of their thinking. 
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Mr. Furron. But there is a basic and urgent necessity to shorten 
the time of 5 to 8 months on translation, and secondly, to broaden 
the field so that we can move on a broad scientific front; is that not 
true ¢ 

Dr. Kine. Yes, sir; very true. 

I would like to also quote some figures from Planning Research 
survey where we find that delays in getting things translated are much 
higher than has been given in other testimony. The delays are really 


between 6 and 21 months for the kind of things we get, say, at IBM 
Research as is shown on the submitted chart (fig. 10). 


| 
0 
n 
” 
y 
n 4 
1S 
S- 
n- 
ut 
ve 
re | 
rm 
its 
Ve 
in | 
he 
& 


Ot 


(W3L!1 JO 3LVO NOILVOINENd)—(3WNIOA 31LVG) = SLVWIXONddY 
(SHLNOW) 3WIL SLVWIXO¥ddY 


A 2 

b +" 

4 

: 49 

m 
S39Vd RX doi 


S$39Vd 


sa39vd s-i [_] 


SW31L! 


RESEARCH ON 


SW31l OS -3ZIS 3 1dWVS 

LOVYLXZ JO 3dAL 
Sudf 

3NIDIGAW- LO3rEns 

NVISSNY 


156 


im 
| | 
| 
— 
q 
ae 
| 
@ ite) N 
aA 
bee 


APPROXIMATE LAG TIME * 


Fieure 10 


RESEARCH ON MECHANICAL TRANSLATION 157 


For instance the translation of the Russian equivalent of our Jour- 
nal of the Optical Society is 18 months late in coming to me. This 
is a long time in this day of rapid technological advance. I think 
that the biggest thing we want to lick is the lack of timeliness in trans- 
lation. 

Mr. Futron. There is a real field here, and there is urgency in the 
United States, through its Government agencies, to get moving to 
initiate a strong effort so that we can remain preeminent in science, 
medicine, and in all these various fields. The fund of world knowl- 
edge could become readily available to us if we had the means of 
translation. 

Dr. Kina. Yes, sir. 

Mr. Futron. That is all. 

The Cuarrman. Colonel Dillon, do you have any questions? 

Colonel Ditton. Yes, sir. 

In the May 8 edition of Pravda that you gave the committee, you 
show the Russian original and the machine translation. I would 
like you to furnish the committee about three paragraphs of post- 
edited machine translation, so that we could compare it with what 
the translation really should look like. It doesn’t show us very much 
right now. I would like a comparison. Could you do that? 

Dr. Kine. Yes, sir; we could today, I believe. 

Colonel Ditton. All right. It doesn’t have to be today. I was 
up to visit the IBM facility recently, and I see you have a new set- 
up, a new building coming into being very soon. Do you plan on 
using this new installation for putting up this model translation cen- 
ter—machine translation center for the Air Force ? 

Dr. Kine. We have not talked to the Air Force seriously about a 
translation center. At the present time we feel we are in the research 
and development program, and there is a lot more work to be done. 
However, we feel that it is timely to consider the setting up of an 
organization. Mr. Fulton, himself, pointed out, for example, the 
tremendous problem of getting the books. That is part of the organ- 
ization. How do you get the book into the system ? 

There are a lot of factors, apart from the translation itself. We 
ought to start thinking them out now so that we do not find ourselves 
ina panic at a later stage. 

Colonel Ditton. Will the Baird-Atomic machine be brought to the 
IBM facility for further research ? 

Dr. Kina. I believe it is the plan of the Air Force to have the 
Baird print reader delivered to our laboratory so we can use it for 
input to our present machine. 

Colonel Ditton. You just mentioned your concept of the future 
regarding the machine translation center. Could you elaborate on 
that as to what should be the overall national policy toward a machine 
translation center, a number of centers, the location of centers, and 
ri type of installations we should have? Have you any ideas on 
that ? 

Dr. Krna. I believe the technical part of the problem, say diction- 
ary lookup, is well enough in hand so that one could consider a single 
center where all requirements can be met by that single center. It 
would save a lot of money if all the efforts on the linguistic work were 
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put into this single machine organization. I think this single center 
could handle all the requirements and coordinate them and consoli- 
date them for general information and intelligence for this country, 
I am sure that there always will be necessity to have outlying trans- 
lation equipment, say with the Army for field data. As Mr. Fulton 
said, if we could package the equipment in a small enough box, we 
could get it into airplanes. 

That is a big engineering request. 

Mr. Futron. You amaze me. This disk fascinates me. 

Dr. Kine. That part is fairly well under control. It is the elec- 
tronics that has to be made that small now. 

I would like to amplify this a bit. The equipment we are building 
is in one sense a prototype, but certain parts of it I think are final 
equipment, so that one center could have one piece of equipment 
which we will stepwise put into its final shape. We are not thinking 
in terms of building one big SAGE system, for example, but getting 
started on what we call an evolutionary complex, where pieces of the 
equipment are phased in as they turn out to be operational. 

The CuammMan. Doctor, in the sample you have given us regarding 
the remarks of Mr. Khrushchev, you have two or three pictures there, 
Were those pictures placed there by your translating machine? 

Dr. Kine. No, sir. I want to say, though, the actual text, you see 
is a xerographic copy of what actually came out of the machine, un- 
touched. 

The CuarrMan. Do you regard this as a very good translation ? 

Dr. Kina. I don’t consider this a translation at all, sir. This is 
just our output at the present stage. 

ihe Cuairman. It is not a translation? It is an output. What is 
it 

Dr. Kine. It is a rendering of Russian into pidgin English. 

The Carman. So you do not regard jt as a translation ¢ 

Dr. Kine. No, sir. 

The CHarrMan. Doesn’t it approach the usable ? 

Dr. Kine. Well, sometimes it does and sometimes it doesn’t. I 
happen to have in my briefcase another thing we put through our 
machine which happened to be a manual for a tinkertoy set. It is 
the worst one we ever did. But just to show how the situation im- 
proved in the last few months, I would be glad to show you this per- 
sonally afterward, and you will see that there is a good deal more 
trash in the output of the machine. 

The CHatrMan. Well, but looking over the translations of what 
Mr. Khrushchev is supposed to have said, I can get considerable 
information out of it. I do not speak Russian. I know I wouldn't 
get anything out of looking at the Russian text. [Laughter.] 

m 30 2 does give me something, at any rate, even though it is pidgin 
nglish. 

Dr. King. I am very glad to hear it. Incidentally, if I may, I 
would like to second General Graul’s invitation to you and the mem- 
bers of the committee. We have open house in our laboratories. The 
machine is on the air most of the time, and we like to have people come 
and bring their own text and have it put through on the spot, and 
actually see the output so that you know it is really genuine. 


| 
| 

| 

| 
: 


is 


RESEARCH ON MECHANICAL TRANSLATION 159 


I hadn’t seen this rendering of Mr. Khrushchev’s speech until this 
morning, so I haven’t had time to read it. This happens to be a 
rendering that is pretty good. 

Mr. Furron. There is a real point there. Why don’t you proceed 
along a certain type of English that has a practical application? 
Why don’t you pick up a particular field and try to complete that field 
as one phase of your investigation and go the full length in a limited 
field ¢ 

Dr. Kine. We have done that, sir. We have another program which 
I haven’t mentioned, of looking up French. One of the advantages 
here is that most of our engineers know French and know the sub- 
tleties of the language, although they do not know the subtleties of 
Russian, even though they may be familiar with Russian. I also 
have an exhibit of a French rendering which I would like to put in 
the record. 

The CuatmrMan. We have an exhibit here, if there is no objection to 

utting it in the record. 

(The exhibit referred to follows on pp. 160 and 161.) 
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INTRODUCTION 


DEFINITION DE LA LOGIQUE MATHEMATIQUE. La logique algébrique qui 
est le sujet de ce cours, est congue ici comme la partie la plus 
élémentaire de la logique mathématique. Plus tard nous précise- 
rons ce que nous entendons signifier par le mot "algébrique®. 
Mais il faut indiquer tout de suite en quoi consiste la logique 
mathématique dont la logique algébrique constitue la premiare 
partie. 


Dans cette intention rappelons que le mot “logique" a trois 
sens différents dans presque toutes les langues. 


Dans le premier cas, la logique est la science du raison- 
nement; elle. veille A distinguer entre les raisonnements valides 
et ceux qui ne sont pas valides, A expliquer la relation entre 
prémisse et conséquent. La logique prise dans ce sens, fait, de- 
puis le temps des anciens, partie de la philosophie. J'emploite- 
rai le nom de "“logique philosophique" pour la logique prise dans 
ce sens. 


Dans l'étude de la logique philosophique, on a trouvé utile 
d'employer des méthodes mathématiques; ceci revient A dire que 
l'on a construit des systémes mathématiques qui ont quelque rap- 
port avec la logique philosophique. 


Ce qu'est un systéme mathématique, et quelles sont ses rela- 
tions possibles avec la logique philosophique, ces questions se- 
ront examinées plus loin. Observons dés maintenant qu'on peut 
étudier ces systémes pour eux-mémes, et que quelquefois on ap 
pelle "logique" l'étude de ces systémes. La logique, ainsi con- 
gue, fait partie de la mathématique; je l'appelle la logique m- 
thématique. 


Machine Print Out of Translation by AN/GSQ—16 (XW—1) Photostore: 
A—French to English 
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INTRODUCTION 
DEFINITION (OF) THE MATHEMATICAL LOGIC 

The algebraic logic which is the subject of this course/s is 
conceived here as the part the most elementary (of) the mathematical 
logic. Later we/us will specify what we/us hear/mean signify by 
the word ''algebraic''. But one needs indicate immediately in what 
consists the mathematical logic whose algebraic logic constitutes the 
first part. 

In this intention recall that the word ‘'logic'' has three 
different sense/s in almost all the languages. 

In the first case/s, the logic is the science of the reasoning; 
it watches to distinguish between the valid reasonings and those 
which are not valid, to explain the relation between premis and 
consequent. The logic took in this sense makes /made, since the time 
(of) the ancients, part/departed (of) the philosophy. I would use 
the name of philosophical logic for logic took in this sense. 

In the study of the philosophical logic, one has found useful 
to use (of) the mathematical methods; this amounts to saying that 
one has construct/ed (of) the mathematical systems which have some 
ratio/relation with the philosophical logic. 

What is the mathematical system, and which are its possible 
relations with the philosophical logic, these questions will be 
examined further. Observe since now that one can study these 
systems. The logic thus conceived, is a part of the mathematical; 


I call it the mathematical logic. 


161 
re 
| 
le 
ue 
| 
n- 


162 RESEARCH ON MECHANICAL TRANSLATION 


Dr. Kine. Thank you, sir. I think you will find this is a good dea] 
better than the Russian we have put out. The main reason for this 
is that the wordlist and the processing has been restricted to one 
field, mathematics. 

The Cuaiman. I would think Mr. Fulton’s idea is a very good 
one, that you might perfect one field like mathematics, for instance, 
then you would have something usable as a product which you could 
immediately proceed with. 

Dr. Kine. That we can do very well within our program, and our 
planning certainly is to do it for French, and considering whether we 
should now start in the Russian area to do that. I might say this 
French effort only took about two people 3 months to do, so that it is 
not going to be a tremendous program to specialize in one field. 

The Cuarrman. We certainly thank you, Doctor. These specimens 
are really worthy of study, and I am going to look them over very 
carefully and get a better idea of where you are in your approach 
to a perfect system, or at least a usable system. So I want to thank 
all of you gentlemen who have come here as witnesses this morning 
to testify before this subcommittee. 

If there is no further business, the subcommittee will adjourn until 
tomorrow morning at 10 o’clock. 

(Whereupon, at 12 noon, the subcommittee adjourned, to reconvene 
at 10 a.m., Tuesday, May 17, 1960, on another subject.) 

(Additional statements submitted to the subcommittee follow in the 
appendix.) 
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STATEMENT ON NAVY MACHINE TRANSLATION RESEARCH BY THE CHIEF oF NAVAL 
RESEARCH, May 28, 1960 


NAVY MACHINE TRANSLATION RESEARCH 


The field of machine translation research looking toward a system capable of 
translating from one language to another automatically can be divided roughly 
into three categories. These are (1) the computer input and output devices, 
(2) the computer itself comprising not only the electronic circuitry capable of 
making logical decisions but the auxiliary memories as well, and (3) the com- 
puter program and associated lexical, syntactic, and semantic techniques enabling 
the computer to translate languages automatically. Research leading to a com- 
pletely automatic capability should proceed along each and all of these three 
lines. 

Of course, it should be appreciated that research in each of the three cate- 
gories listed would have considerable application to the solution of many 
scientific and engineering problems other than those in machine translation and 
might well be motivated by the necessity to solve numerous varied types of 
problems. Accordingly, research in these areas, although of great importance 
and significance to machine translation research is frequently not supported 
specifically under this category of research. 

Since the very inception of the high-speed digital computer, the Navy has 
been heavily involved in the support of computers, Computer organization, pro- 
graming techniques, computer technology, and auxiliary computer devices. For 
example, the Navy supported at least in part some of the earliest and most 
imaginative attempts to build high-speed digital computers such as the Harvard 
mark II and mark III computers, the Institute for Advanced Study computer, 
the Whirlwind computer, the Naval Ordnance Research computer and the 
George Washington University logistics research computer. These computers 
provided much of the technology and organization for existing and proposed 
high-speed computers so necessary for machine translation. 

The Navy is currently sponsoring, at least in part, a number of the newest and 
most advanced high-speed computers such as the Remington Rand LARC, the 
Philco TRANSAC and computers at the University of Illinois and the University 
of California at Los Angeles and is supporting million-dollar contracts at 
IBM, Sperry Rand, and RCA to consider novel technology for the design of 
computers that operate in a billionth of a second. 

With regard to the question of auxiliary computer components, particularly 
high-speed, high-density memories, the Office of Naval Research has always 
heavily sponsored in the past and currently continues to sponsor research in 
these areas. For example, the large photomemory which is the heart of the 
IBM machine translation system, was initially developed under a Navy contract 
with the International Telemeter Corp. from June 1953 to August 1954 with Dr. 
Gilbert King, who is now at IBM, as principal investigator. The Office of Naval 
Research sponsored the basic research leading to the magnetic core memory 
under Project Whirlwind at MIT, as well as the magnetic drum memory under 
contract with Engineering Research Associates which subsequently merged with 
Remington Rand. Magnetic cores and magnetic drums are still the main 
memories used in present day digital computers. 

Currently the Office of Nava) Research is sponsoring a number of tasks toward 
the development of other types of high-speed, high-density memories which 
will be of great use in their application to machine translation problems. These 
include, among other tasks, a novel photomemory of considerably different design 


and potentially greater capabilities than that of IBM and Dr. King. This 
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photomemory is being developed under contract to MIT and Hydel, Inc., of 
Waltham, Mass. 

The first category mentioned, namely computer input and output devices, has 
been recognized by the Navy as an area of the greatest importance insofar as the 
use of computing techniques is concerned. Both the Office of Naval Research 
and the Bureau of Supplies and Accounts have been actively concerned with 
research in this area. 

The Navy now supports a number of research projects which will lead to 
computer input devices which should be able rapidly and easily to read a large 
variety of different type fonts as well as eventually handwritten characters, 
Among these, the Navy supports projects concerning the “Perceptron” at Cornell 
Aeronautical Laboratory and Cornell University. Other projects under Navy 
support in this area exist at the University of Chicago and the United Research 
Corp. in Cambridge, Mass. The Bureau of Ordnance had supported from Janu- 
ary 1954 to July 1957 fundamental research in pattern recognition at Baird- 
Atomic, Inc., in Cambridge, Mass., which led to the print reader described before 
this committee. 

Much work along these lines currently is being carried on at the Research 
Laboratory for Electronics at MIT and Lincoln Laboratory of MIT in Lexington, 
Mass. Both of these are three Service contracts in which the Navy participates, 
The Navy carries on research on the recognition of speech, eventually as a means 
of computer input, at the University of Michigan and has supported related 
work at IBM. 

With regard to computer output devices, the Navy has supported work which 
led to the now widely used Stromberg Carlson high-speed printer capable of 
printing 5,000 lines per minute. 

The third category mentioned, namely, the computer program and associated 
linguistic research has been recognized by the Navy to be very adequately 
supported by other agencies, particularly the Air Force, the Army, the Central 
Intelligence Agency, and the National Science Foundation. The Navy expects to 
profit very considerably from these efforts. Accordingly, apart from activities 
in support of the needs of the Office of Naval Intelligence not met elsewhere, 
the Navy limits its activities in this regard to small imaginative fundamental 
efforts where unique and significant ideas are likely to arise rather than to the 
more massive development type of support. 

The Navy currently supports research in machine translation of a general 
critical nature at Hebrew University in Israel and at Wayne State University, 
where a small group is considering the translation of mathematical literature 
from Russian to English. This group cooperates very closely with groups at 
Georgetown University under CIA sponsorship and Ramo-Wooldridge under Air 
Force sponsorship. From 1952 through 1956 the Navy supported a basic study of 
Russian linguistics also at Wayne State University. This resulted in a most 
important book by Prof. Harry Josselson, “The Russian World Count.” 

The Navy encourages and relies heavily on the dissemination of information 
among the various groups concerned in machine translation. Accordingly, the 
Office of Naval Research assisted in the sponsorship of a conference on MT at 
the University of California at Los Angeles in February 1960 and is jointly 
sponsoring with the National Science Foundation an MT working conference 
of the various groups in the field at Princeton in July of this year. Also the 
Navy assisted, through its Project Focus, in the sponsorship of a symposium on 
the structure of language and its mathematical aspects at the American Mathe 
matical Society meeting in New York City in April 1960. 

The Navy is involved in a number of coordination activities in MT through its 
membership on various governmental committees. Some of these are the Sub- 
committee for Mechanical Translation of the Committee on Documentation of 
the Intelligence Board, the National Science Foundation Interagency Commit- 
tee on Mechanical Translation and Information Retrieval, and the Interagency 
Group for Research on Information Systems. 

In addition the Navy supports a number of peripheral activities which are 
closely connected with, although not specifically directed toward, machine trans- 
lation. Those activities supported include research in computed programing at 
Princeton and Carnegie Institute of Technology, and research in information 
retrieval at Hebrew University in Israel, at Benson-Lehner Corp., in Los An- 
geles, and at the University of Pennsylvania. 
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The Navy feels as Prof. V. Yngve of MIT has mentioned in his prepared 
statement to this committee, that machine translation is one facet in the com- 
munication sciences consisting of artificial intelligence, communication bio- 
physcs, communication systems, experimental psychology, linguistics, neuro- 
physiology, processing and transmission of information, sensory aids for the 
handicapped, social science, and speech communication. Research in all of these 
fields is currently supported by the Navy at MIT and elsewhere and advances are 
expected to have a strong bearing on progress in machine translation. q 

The reasons for Navy support of machine translation and auxiliary projects 
are three in number : 

First, good machine translation would be of great and immediate value to the 
Office of Naval Intelligence. Much of the information used by ONI arrives 
in one foreign language or another. Acceptable automatic translation not only 
would increase manyfold the amount of raw data which could be ingested, but 
also could improve the accuracy and consistency of available English transla- 
tions. Perhaps most important, competent analysts would be freed from the 
necessity of personally translating documents which they need quickly, thus 
leaving additional time available for the more abstract aspects of intelligence 
analysis. An added quasi-intelligence benefit to the Navy resulting from the 
availability of good mechanical translation equipment would be the ease of trans- 
lating information from English to the languages cf the various foreign per- 
sonnel encountered by naval forces in various parts of the world. 

Second, the Navy has a very great interest in the translation to English 
of foreign scientific and engineering literature. The Navy spends many millions. 
of dollars annually in discovering and developing new devices and methods. 
Wide availability of pertinent foreign information could easily shorten the de- 
velopment periods required and reduce the money spent on work already ac- 
complished elsewhere. American scientists would accordingly be considerably 
more aware of foreign projects and would have a broader base of scientific 
research upon which to draw in attacking specific Navy problems. 

Third, machine translation is a most exciting application of high-speed com- 
puter technology. As has been mentioned, many of the problems involved in 
machine translation are common to a number of fields of information processing, 
so that progress in one field results in progress in the others, as well as giving 
additional insights into the solution of problems in other fields. Thus, for ex- 
ample, advances in machine translation will yield simultaneous improvements 
to document storage and retrieval, high-speed data processing, and automatic 
programing. All these subjects are of vital interest to the Navy and contribu- 
tions to them will in many cases have immediate and widespread application. 

Following is a list of contractors of the Navy which are and have been specifi- 
eally involved with research on machine translation. Those contracts con- 
cerned with other areas of data processing and information technology having a 
bearing, as previously mentioned, on machine translation, have been omitted 
if they have been otherwise motivated. 


Contractor Title Dates of Amount of | Fiscal year 

contract contract | 1960 funds 

“Cumbriage | “character $6000 25,00 
Ss. 

Hydel, Inc., Waltham, Mass_..| High-speed photomemory---|f August 1958_____}) 

Wayne State University, De- | Russian 83) 

troit, Mich. July 1956_....__. } 21, 400 None 

Corp., photomem- | { \ 28, 300 None 

Inc., Cam- | Pattern recognition studies_. \ 100, 000 None 
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STATEMENT ON MACHINE TRANSLATION BY DON R. SWANSON, MANAGER, SYNTHETIO 
INTELLIGENCE DEPARTMENT, RAMO WOOLDRIDGE, A DIVISION OF THOMPSON Ramo 
INc. 


Mr. Chairman and members of the committee, it is a privilege to present my 
views on machine translation for the record of these hearings. 

I shall outline briefly the factors which I believe to be critical in assessing 
the present state of the art from the point of view of developing a useful ma- 
chine translation capability. 

Let us first separate the problems of machine development from those of 
language research by observing that general purpose stored program computers 
have been available in large numbers for the past 10 years and that these com- 
puters are in principle as fully capable of translating languages as machines ever 
will be. Future machines, particularly if they are specially designed for trans. 
lation, may be more economical for that purpose, but they will not otherwise be 
able to perform operations that cannot be performed with present machines, 
Any general purpose computer may be changed in a matter of minutes from a 
satellite trajectory calculator or a differential equation solver to a language 
translator simply by providing it with a different set of instructions. The se 
quence of instructions that the machine follows for any particular problem ig 
ealled the program for that problem. If we suppose that a general purpose 
computer is somewhat akin to a robot cook who can perform all basic operations 
necessary to prepare any conceivable dish, then for each dish it is clear that our 
cook needs a recipe and the ability to interpret instructions contained in that 
recipe, provided they are written in a prescribed and rigid robot language. 

In the same way a general purpose computer used for tracking satellites is 
following a recipe, or a sequence of instructions which tell it in a precise way 
how to perform the tracking calculations. These instructions, called the pro- 
gram, are recorded or stored in its own memory where they are accessible to 
interpretation one at a time. The same computer can become a language trans- 
lator by erasing the recipe for satellite tracking and replacing it with a recipe 
for language translation. 

Thus I wish to emphasize that most present research on machine translation 
is addressed not to hardware development, but to the problem of developing a 
recipe that is a set of rules, or a program for automatic translation. In addi- 
tion to the program, certain auxiliary materials are needed. The machine 
must be provided with a bilingual dictionary (usually recorded on magnetic 
tape) and of course it must be provided with the input text for translation. 
This latter operation, at the present time, requires that each character of such 
text be recorded in coded form on punched cards, punched paper tape, or mag- 
netie tape, for input into the computer. This coding process is carried out by a 
manual key-punching operation quite similar to typing; this process is expensive 
and slow, comparable in both respects to human translation. 

Now my earlier statement that existing computers are adequate for machine 
translation must be qualified in two major respects. The slow and expensive 
input process just mentioned implies a requirement for automatic character 
recognition in order to transform the contents of the printed page into a code 
that can be processed by machine. The task of character recognition is of po- 
tential use in a wide variety of applications and may be considered as an engi- 
neering problem entirely separate from research on machine translation. In 
the case of machine translation, however, the task of input transcription is 
greatly complicated by the need for recognizing different sizes and types of font 
within the same article, as well as both upper and lower case letters, footnotes, 
italic, several foreign alphabets, and certain symbols. Equations, charts, 
graphs, tables, and illustrations must somehow be reproduced in any machine 
translation output in order to create a high-quality product acceptable accord- 
ing to format and printing standards of scientific journals. Thus existing ma- 
chines can do machine translation except for the task of input and output for 
which at present they are quite inadequate. 

Let us briefly consider the state of the art of machine translation recipe 
development at the present time. Many groups, including that at Ramo Wool- 
dridge, have had a capability during the past 2 or 3 years for producing machine 
translations from Russian to English that appear on the surface to be more 
or less readable and understandable. It is not particularly useful to debate 
here the relative quality of output of the various groups, since all recipes pres- 
ently in use are known to be exceedingly deficient and efforts are in progress to 
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effect improvements. It is particularly significant to observe that one can 
produce an English output impressive to the layman at a stage of the total 
research program in which less than one-tenth of the work has been done. This 
ability to create an impressive and even sensational product of marginal sig- 
nificance has apparently led to considerable confusion in the attempts by many 
persons both in and out of the field to assess the state of the art. To paraphrase 
a cynical observation on the institution of marriage “machine translation is a 
long dull meal with the dessert at the beginning.” The public and customers 
have repeatedly been served a variety of desserts during the past 6 years, but 
the breakthrough demonstration accompanied by newspaper whipped cream 
is finally beginning to go out of fashion. Most serious workers in the field are 
now digging through the long dull meal in a dedicated attempt to solve the 
hundreds of problems that must still be solved. Much progress has been made 
and nothing stands clearly in the way of a highly useful product, eventually. 
The many faceted nature of the problem is such that there never has been a 
single or sudden breakthrough and I think it unlikely that there ever will be. 

The question of criteria which one should apply in order to judge production 
readiness for machine translation arises frequently. The customer is, of course, 
the best judge of whether or not present machine products are acceptable to 
any degree for his purposes. Whether the customer speaks with a single voice 
on this matter I do not know, but a few observations may be useful. 

First, since the quality of the product capable of being produced at this time 
is unknown even within very wide limits, let me skip over this somewhat con- 
troversial point and simply consider the palatability of the output format. 

Present printing equipment auxiliary to general purpose computers provides 
only a single uppercase type font. Consider the readability of a magazine with- 
out bold headlines, columns, lowercase letters, or illustrations of any kind as 
indicative of what the consumer would have to accept as present machine 
translation output. If a requirement exists for present products, then of course 
such products should be produced, but I should think it would be possible to 
earry out relatively simple small-scale tests to resolve the apparently conflicting 
opinions on the usability of such translation prior to allocating large sums of 
money for “production” purposes. Even if machine translation of acceptable 
quality could be produced in acceptable format several other serious questions 
can be raised on the overall environment in which it is used. Translation by 
itself does not solve the problem of distribution, dissemination, and accessibility 
to the scientific community. Machine translation perhaps ought to be regarded 
as one portion of the total problem of communicating scientific information. 
One might conclude, for example, that automatic translation of the entire avail- 
able body of Russian scientific literature makes less sense than certain alterna- 
tives of a more selective nature. 

I shall conclude with a brief description of machine translation research at 
Ramo Wooldridge. For the past 2 years, a general purpose computer program 
for translation of Russian to English has been in use for experimental purposes. 
Approximately 100,000 words of current Russian scientific literature have been 
machine translated and the results analyzed. 

A new recipe for translation which embodies several hundred changes to the 
recipe of 2 years ago will result within the next 3 or 4 months in a greatly im- 
proved machine translation capability. This capability will still be limited 
to a single technical field, specifically physics, due to limitations of dictionary, 
but is expected to yield correct syntactic analyses of 80 to 90 percent of en- 
countered sentences. Problems of multiple meaning will be almost negligible 
because of the restriction to narrow subject matter, but will by no means have 
been solved insofar as application to broader context is concerned. 

A distinctive feature of the Ramo Wooldridge research procedure lies in the 
systematic methods for analyzing the machine translation product in order to 
develop improved translation rules and improved dictionary entries. Major 
emphasis is placed on multiple meaning or semantics. The analysis process itself 
is aided by machine techniques which in effect serve to augment the intuitive 
grasp of human beings for language phenomena. Because neither the linguist 
nor the native speaker has immediately at his command the many thousands of 
individual features of context that determine translation rules (particularly for 
multiple meaning problems) it is especially important to supplement human 
knowledge with exposure and analysis of new problem areas through experimen- 
tal machine translation. Because of the interdependency among the multitude 
of translation rules, it is necessary and desirable to group together for analysis 
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purposes classes of problems with similar attributes. It is in this area of collect- 
ing, grouping, and organizing the large amount of language data pertinent to 
translation problem areas that machine techniques have proved to be especially 
valuable in the Ramo Wooldridge research procedure. 

These same machine-aided research methods have been successfully extended 
to other areas in which the role of natural language is important, such as auto- 
matic indexing, abstracting, dissemination, and information retrieval. 

(Machine translation research at Ramo Wooldridge has been sponsored in part 
under contract AF-30 (602)-—2036 with the Rome Air Development Center and 
in part under the 1959 and 1960 Ramo Wooldridge general research program.) 


STATEMENT OF Harry H. JOSSELSON AND ARVID W. JACOBSON, OF THE WAYNE STATE 
UNIVERSITY, DETROIT, MicH., ON PROBLEMS ASSOCIATED WITH THE MACHINE 
TRANSLATION OF RUSSIAN INTO ENGLISH 


This project has been sponsored by the Department of the Navy, Office of 
Naval Research, Information Systems Branch, under project No. NONR-2562 (00). 

In translating from Russian into English, or for that matter from any language 
into any other, it is first necessary to discover in each language what the various 
discreet linguistic units are and, second, how these units are arranged into 
larger units in order to carry a certain amount of information. After both the 
source language, the language from which we translate, and the target language, 
the language into which we translate, have been analyzed, it is necessary to 
establish precisely the interrelationship between both languages on all levels— 
word level, phrase level, sentence level. At this stage it becomes necessary to 
express correspondences between the target language and the source language 
in terms of the former, since it is the end output in which we are interested, 
a finished translation. If we reverse the position of the two languages, i.e., make 
the source language the target language, and vice versa, a different set of 
formulations will be necessary. If we translate Russian into English, we 
arrive at a series of statements expressing the relationship between Russian and 
English in terms of the final product, which is English. In translating from 
English into Russian, a somewhat different set of relationships will emerge, if 
we are interested in obtaining a satisfactory and readable Russian translation. 

Let us illustrate. Here is a simple statement: Russian lacks the grammatical 
category of the article, ie., Russian does not have the three English words 
“the,” “a,” and “an.” If we translate from English into Russian, we must 
ascertain under what circumstances the English article can be omitted in Russian, 
and if not, what are the conditions under which the English definite and indefinite 
articles are expressed by different linguistic structures in Russian. However, if 
we reverse the process and translate from Russian into English, we must ascer- 
tain the conditions under which we must add or not add in English, a grammat- 
ical category that does not exist in Russian. 

After we have made our separate analyses of both languages and established 
their interrelationships on the necessary levels from the viewpoint of the target 
language, and after we have done that in the symbolism of the natural human 
language, we have to express the whole process in terms of machine language, 
i.e., in routines which a computer can understand. This is the province of the 
mathematician, logician, and the computer specialist, and the problems and 
complexities involved herein will be dealt with later in this paper. Suffice it to 
say here, however, that the linguist working on the preliminary linguistic 
analyses must be fully aware of the limitations and restrictions imposed by the 
computing machine when the latter is applied to linguistic analyses. The 
mathematician, and the machine which he uses as a tool, always look for a com- 
plete and elegant solution: all possible special cases and exceptions must be 
taken into consideration in working out an algorithm. 

We have definite ideas as to the approach to the complex problem of machine 
translation. We feel that the most effective attack would be one where the ma- 
terial was limited to a fairly narrow area in a subject field. We are naturally 
concerned about doing repetitious work, going over the same things in a way that 
had already been done elsewhere. To avoid this overlapping of effort, both in 
regard to the method of attack and the areas of investigation, we have diligently 
searched the literature and have personally been in contact with most of the 
centers in which work is being done in this field. We are very pleased to have 
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had the cordial and helpful cooperation wherever we have sought contact and 
explanation of the work that was being done. 

On the basis of these discussions with other groups, we have become more 
assured that our basic approach to this problem is reasonable and effective. 
Likewise, we are now aware of what is being done here and abroad and can 
thus avoid directly repetitious work. It is, of course, natural that in the begin- 
ning a certain amount of ground common to all investigations had to be covered. 

In accordance with our decision to limit the subject field, we have chosen an 
area in mathematics dealing with partial differential equations. This restric- 
tion will limit not only the vocabulary but also, we believe, the structural diver- 
sity inherent in the general language. Accordingly, we have chosen three Rus- 
sian articles, one from Matenatncheckmiy cboprhege, 1955, “The solution of 
problems of Cauchy for certain types of systems of linear partial differential 
equations,” by V. M. Borok, second from ychexh mathemathcheckax haye, 1953. 
“Fourier transforms of rapidly increasing functions and questions of the unique- 
ness of the solution of Cauchy’s problem,” by I. M. Gel’fand and G. E. Shilov, 
and third from ycuexm matemathchecrhx have, 1954, “On the solution of 
Cauchy’s problem for regular systems of linear partial differential equations,” 
by A. G. Kostyuchenko and G. E. Shilov. For these there exist translations 
prepared by the American Mathematical Society. This gives us parallel texts 
to work with. Incidentally, we have made changes in the English translations 
in order to bring them closer to Russian forms retaining them in acceptable 
English. We feel that the parallel text approach has numerous advantages in 
relating the two languages structurally as well as in the specific resolution of 
ambiguities in meaning and form. 

Another feature of our program is that we are aiming at a careful linguistic 
analysis of the material prior to any effort to program work for a computing 
machine. What we wish to say is that our main area of attention concerns 
the structural analysis of the language for the purposes of mechanical trans- 
lation. We are not concentrating especially on the problems of glossaries; we 
confine also our efforts to such questions as multiple meaning, insertion or 
deletion. We believe that these questions can only be resolved in the overall 
analysis of structure, perhaps with the aid of semantic considerations. Our 
effort concerns with the determination of clause boundaries and the isolation 
of other lexical groups which must be carried out if automatic procedures are 
to be arrived at for their translation into another language. To generally 
describe our attack then, we would say that (1) we are working with a small 
subject field; (2) we are concentrating mainly on problems of ambiguity, both 
on the lexical as well as the morphological level, and of rearrangement, laying 
other equally important problems aside for the moment; and (3) our main 
procedure involves structural analysis and the use of parallel texts. With 
these rules as our guides, we aim at developing practical translation procedures 
yielding fluent and accurate text. 

We believe that postediting in actual production translation work will be 
necessary, but, as more experience is gained and procedures are refined, the 
amount of postediting will diminish. 

We will now describe our general procedure. It is characterized by three 
distinct starting points for processes which ultimately merge. These starting 
points are (1) preparation of the program, (2) preparation of the dictionary, 
and (3) preparation of the text on cards. Figure 1 represents an outline of our 
general procedure. 

The input to the procedure loop involving the program consists of the cur- 
rent program tape and a text tape which comprises the words encountered 
in the text (in text order) and the dictionary information about each word. 
The output is a printout of the translation effected which, when subjected to 
& postediting procedure, will be the basis for program changes. Thus, ultimately, 
anew current program tape will result. 

The postediting may indicate the necessity for certain dictionary changes, as 
well as program changes, and this leads us to the dictionary procedure loop. 
A dictionary based on words encountered in a number of texts (mathematical, 
in our case) is prepared on cards. The information includes the Russian word, 
grammatical and syntactic information, and the English translations of the 
word. These cards will be sorted into alphabetical order and converted to tape, 
and, in the process, the abbreviated card codes will be expanded to the binary 
code called for in the program. Obviously, it will be continuously necessary 
to add new entries to the dictionary, as well as to correct entries on the basis 
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of experience gained through program runs. The additional entries necessary 
will be indicated by the failure to find words in the dictionary which appear 
in a new text being processed. Thus, there is a provision in the loop to update 
the dictionary and, simultaneously, to fill in the gaps in the text tape. Since 
the text will be in alphabetical order for the purpose, it must be resorted to 
text order for input into the program loop. 

Finally, we note that the preparation of the text involves punching the text 
on cards (one line per card, converting to tape (one line per record), number- 
ing each word (according to article, page, line, sentence, and occurrence in 
sentence), sorting the words into alphabetical order, performing a dictionary 
lookup, and producing a tape (which will be later completed) containing the 
words found in the dictionary with the information therein, together with the 
words not found, and a separate tape of the words not found, from which the 
aforementioned addition to the dictionary will be prepared. 

Since some notable work has already been accomplished in the machine trans- 
lation field by other groups, and since we have greatly profited therefrom, we 
are now concentrating on investigating extensive syntactic analysis routines 
in order to produce a better than word-for-word translation. Our work consists 
of testing some techniques of MT which, essentially, are experiments with 
sophisticated concepts of syntax conveived from the point of view of machine 
translation. 

We are working with particular problems in syntax, using the computer to 
test our routines. We will be using an IBM 709 computer, with 32,000 words 
of storage and two data synchronizer channels with two 12-tape units on line, 
This computer is located at Chrysler Corp. offices in Detroit, which has been 
very cooperative in allowing us to use their computer in our work. Our tech- 
nique is to isolate a problem by manually simulating or by excluding all those 
parts of a translation program which are not directly concerned with the prob- 
lem under consideration. 

In the syntactic analysis of a sentence, our method of proceeding is to group 
elements together in units, each unit having a definite syntactic function, anal- 
ogous to the original parts of speech—e.g., a noun, with its modifiers forms a 
nominal block. These blocks may in turn be grouped into larger blocks, until, 
in the ultimate case, only one large block remains. In one sense, this nested 
system of blocks with grammatical labels defining the sentence structure. 

If in each block procedure, enough information is stored in the grammar 
code describing that block so the further blocking need never refer to the in- 
ternal structure of any block used, except insofar as this structure is described 
by the block grammar code, then, in each blocking pass, one deals only with the 
blocks which have been constructed in the preceding pass. One may conceive 
of these blocks as forming an image of the preceding block structure under a 
blocking ‘‘transformation.” Thus, the sequence of sentence “images” will define 
the syntactic structure. For experimental purposes only the two current images 
need be retained in the storage of the computer unless it happens that an early 
blocking step was incorrect. If a later stage shows this to be the case, one 
must recall the last correct image and redo all succeeding passes. 

This, of course, requires the construction of new grammar codes, describing 
syntactic functions of blocks. In most cases these will be quite similar to 
already existing codes. 

In order to test our routines on the computer, for the present we have decided 
to simulate the dictionary lookup, ignore the homograph problem (in certain 
instances), and ignore the English translation output problem temporarily in 
the initial states of programming by limiting the output to an indication of 
block boundaries. At this point it is appropriate to mention that our ultimate 
goal is a sentence, as pointed out above, image whose elements are properly 
labeled blocks (with indication of syntactic function, e.g., subject, and gram- 
matical class membership, i.e., capacity for syntactic function) which can be 
manipulated by syntactic rules and related to semantic rules to produce high- 
quality translation. 

As a result of syntactic analysis from the MT point of view, it has been 
necessary to depart from the traditional part-of-speech scheme and consider 
word classes and subclasses in terms of syntactic function and distribution 
(e.g., adverbs which modify only those items to the right of them, like oubeH 
“very”). These new form classes are presented in detail in figure 2. 

These considerations of syntactic function and distribution, in addition to 
purely morphological criteria, have resulted in a reshuffling of form classes of 
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Russian used in traditional or even descriptive grammar. Thus, a major word 
class, conjunctions, is divided into “ambiguous, nonambiguous,” “finite, infinite,” 
“contains ‘6nr,’ does not contain ‘6n’.”. A new major word class, modifiers, cuts 
across traditional divisions by including adjectives, participles, numerals, de- 
monstrative pronouns, and possessive pronouns. A comparison of the new and 
old grammar classes is contained in figure 3. 
In addition to a word class code, agreement and government codes become 
part of the grammar code. The government code has two aspects: case govern- 
ment, and the less obvious prepositional government. The government codes 
relates to each other syntactic elements which are closely dependent rather 
than just loosely collocated. 
Since we are going to use an IBM 709, all the information contained in the 
grammar code is coded in binary form. So far the entire grammar code occupies 
three 36-bit machine words. The actual binary code is such that each gram- 
matical distinction is, insofar as possible, represented by a single binary bit. 
Illustrations of the details of how the individual words are coded are presented 
in figures 4a—g, while figure 5 illustrates the coding of an actual Russian sentence 
taken from our mathematical text. 
As an example of one of our program steps, we will now describe our nominal 
block routine, illustrated in figure 6. When a nominal is detected in a left to 
right scan of the sentence, the nominal blocking routine is entered for the 
purpose of packaging the nominal with its preceding modifiers, including, pos- 
sible adverbs modifying these preceding modifiers. The first question asks 
whether there is any string of modifiers preceding the nominal. Intermediate 
adverbs modifying members of the string are included in the package, since they 
do not terminate the search. Commas and ambiguous conjunctions (“,a,u2 no, 
and others) are also skipped because their function is to separate the series 
of modifiers. 
If there are no modifiers, then we conclude that the nominal block consists 

of just one element, namely, the nominal. If these are modifiers, then we must 
ask whether they agree with the nominal. If any of the modifiers fails to 
agree with the nominal (in number, case, and gender), then it is necessary to 
investigate this string in terms of more complex types of agreement, e.g., a 
numeral in the string of modifiers, or a compound nominal block, or adjectives 
whose nature requires them to be singular while modifying plural nouns, as in— 
UNH HecCKO“KHX ypasHenuii—‘‘for one or several equations.” 
These eventualities will be considered in the subsequent complex agreement 
routine. 
If, however, the modifiers all agree with the nominal, we can mark the boun- 
daries of the nominal block, with the leftmost modifier as the preceding boundary, 
and the nominal as the following boundary. At this point, the entire block is 
given a grammar code so that it may be treated as a unit. The grammar 
code is obtained as follows: the agreement code of the block is the minimal agree- 
ment among the modifiers and the nominal, and the government code of the block 
is the government code attributed to the nominal. 
It may be possible to extend the left boundary of the nominal block to include 
preceding adverbs, if these adverbs are known to belong to the modifiers which 
follow them. If such is the case, the preceding boundary is changed, and 
marked at the leftmost adverb satisfying the condition. In either case, we exit 
from the routine with a multiple word nominal block marked by preceding 
and following boundaries and grammar coded as indicated. 
The nominal block routine just described and a number of similar routines, 
the purpose of which is to identify other properly labeled blocks comprising 
the sentence, are conceived of as preliminary passes to the main syntax passes. 
The purpose of the latter is to produce, by the application of proper routines, 
a sentence image, which in turn will yield, after being subjected to cleanup 
passes to resolve the few remaining lexical and morphological ambiguities, a 
better than word-for-word translation. 
In conclusion we wish to remark that machine translation as a practical pro- 
duction of acceptable text is conceived, for some time to come in specific subject 
areas. The limiting of the area to a narrow science field reduces the complexity 
of the problem very markedly. This simplification occurs in the size of the 
glossary, number of multiple meanings, lesser variability in structural forms, and 
in other ways. 
It is for the above reasons that we have chosen a limited science field for 
study, that is, mathematics. We are thus producing a necessary machine vo- 
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cabulary in mathematics as well as a translation program which takes into 
account the syntactic peculiarities of mathematical style of writing. Further. 
more we chose the field of mathematics knowing that this field of knowledge is 
central in the sciences and vital to the national effort. 


STATEMENT OF Pror. W. P. LEHMAN, MACHINE LANGUAGE TRANSLATION Srupy, 
UNIvERSITY OF Texas, AUSTIN, TEX. 


NEED FOR TRANSLATION BY MECHANICAL MEANS 


Translation by machine is the only practicable method of meeting the urgent 
problem of making scientific and technical information published in foreign 
languages available to American science and technology. The volume of publi- 
cation in numerous areas is increasing rapidly, as any survey demonstrates, 
Moreover, with the spread of scientific interest to all areas of the world, publi- 
eations of value are being produced in a greater variety of languages than before. 
We cannot assume, even today, that significant publications will appear only in 
French, German, or English. Russian publications are now almost exclusively 
in Russian ; Chinese scientists publish in Chinese; and as the level of technology 
increases in countries with a strong nationalistic tradition, the United Arab 
Republic, Indonesia, and so on, we can expect important works published in the 
national languages of these countries. If one moves from the technological 
sphere to that of intelligence, even the seven languages alluded to above are inade- 
quate to keep abreast with activities today. They will not help us in all of 
South America, in much of Africa, nor in important sections of Asia. No one 
can master all the important languages while specializing in one of the sciences, 
For rapid access to publications these will have to be provided for him in his 
native language. : 

At least for the present there is no likelihood that the number of important 
languages will diminish. For nationalism is complicating the world’s linguistic 
situation. India, for example, is abandoning English in favor of Hindi. New 
countries in Africa may follow India’s pattern. 

Nor will an artificial language—e.g., Esperanto or even an international nat- 
ural language—solve the problem. All artificial languages have been based 
on those of Europe, so that they are not readily mastered by non-Europeans, 
Even the most widely used has a pitifully small number of speakers. 

On the other hand, in the nationalistic world of today, languages associated 
with great powers will be suspected for potential cultural domination. It is 
unlikely that native speakers of English will adopt Russian as their technical 
language. We can expect the Russians to maintain a similar position toward 
English. 

A further possibility of keeping abreast with current publication would be 
employment of a corps of human translators. This possibility is dubious be 
cause of expense and unmanageability. Translators would have to master tech- 
nical fields as well as different languages in order to produce adequate transla- 
tions. The number of translators needed to deal with Russian nuclear physics 
would probably not be many fewer than the number of nuclear physicists our 
universities have trained. 

We are left then with the necessity of devising mechanical means of transla- 
tion, because of the number of languages that exist, the wide number of tech- 
nical areas, and the tremendous volume of publication. 


METHODS OF TRANSLATING BY MECHANICAL MEANS 


The methods which will prove most reliable in any new field cannot be com- 
pletely foreseen. Machine translation is a very new field and accordingly no 
one can predict which methods will eventually be most successful. In the pres 
ent stage of study, workers in machine translation are testing various ap- 
proaches in the approved manner of any scientific pursuit. 

It is clear that machine translation will combine two complex fields: lin- 
guistics and data processing. Techniques of data processing are no older than 
contemporary computers. Linguistics is fortunately somewhat older, yet many 
of its problems are unsolved. For successful machine translation we will have 
to develop both fields and train workers in the techniques of both fields. The 
very scarcity of students capable in either field, not to speak of both combined, 
may indicate the difficulties we face. 
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Our group at the University of Texas is about to complete its first year of 
relatively intensive research. One of our obligations was study of research 
in the field; in our quarterly reports we have discussed our findings. If we 
may summarize them very briefly, it is by saying that the early work in ma- 
chine translation relied most heavily on linguistic analysis as the underlying 
technique which was most thoroughly understood. Workers in machine transla- 
tion assumed that a complete grammar of one language in contrast with an- 
other—e.g. English in contrast with Russian—could simply be programed on a 
computer. The chief shortcoming of this approach is caused by the nature of 
language; its possibilities are indefinitely large. Accordingly, a complete an- 
alysis is impossible. (Even if one made an exhaustive analysis, the intro- 
duction of a new term for one of the items constantly being discovered or de- 
veloped, would require revision of the computer program.) Since no linguistic 
analysis can be complete, any program built on a linguistic analysis will re- 
quire revisions. These will become so complex that the program itself will be 
unmanageable. Accordingly, a more imaginative use of computers will need to 
be made. 

We must develop for their use “general purpose translation processes.” Com- 
puters have excellent capabilities for handling data; to exploit these we must 
use computers for processing linguistic data. This approach has the added ad- 
yantage that computer theory has provided models which workers in machine 
translation can adapt for their purposes. (An extended statement of this ap- 
proach will be given in our fourth quarterly report, which will be produced for 
our sponsor this May.) 

As the programs underlying “general purpose translation processes” are being 
worked out, a tremendous amount of linguistic material will have to be analyzed 
and the results incorporated in our programs. 


PRESENT STATUS OF RESEARCH 


Full-seale research in machine translation is at best 5 years old. When one 
considers the complexity of language, the lack of experience with computers, 
and our lack of understanding of the process known as translation, one can 
scarcely expect the present status of research to be very advanced. Even the 
workers in machine translation in the Soviet Union, who have received much 
more support than have their counterparts in our country, have scarcely pro- 
gressed beyond initial theory. We have, however, made sufficient progress to 
assure ourselves that translation by computer is not an impossibility, and to con- 
clude that it can and must be more than 90-percent effective. 

Valuable work is being carried on in a number of centers in this cpuntry, as 
well as abroad. In view of the recency of support for work in machine transla- 
tion and the small amount of support in contrast with the magnitude of the 
task and its potential contribution, the research has been successful. 

Translation by machine is tied to other developments, notably the production 
of scanning devices which will introduce texts into computers by electronic 
means. Unless such devices are available, speed of translation will be limited to 
the speed by which texts can be provided to computers by manual means. 

Fortunately work on scanning devices is also promising, and there is every 
likelihood that they will be available when adequate techniques for machine 
translation have been produced. 


EXTENT OF GOVERNMENT SUPPORT 


Our group at the University of Texas has received great encouragement from 
the university administration. We may demonstrate its interest by citing a 
Tecent statement by the provost of the university, in which he called improved 
communication our first immediate concern, common to industry and education. 
With other members of the administration he has followed closely the progress 
of our work. 

Since the university has still to obtain adequate financial support for a large 
computer, our work would be impossible without support from our sponsor, the 
U.S. Army Signal Research and Development Laboratory. Our relations with 
the Laboratory have been productive in many ways. We have learned of the 
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ernment support is then essential for work in machine translation. Moreover 
machine translation will require continuation of central administration. 

When successful, machine translations need be made at only one center. For 
if a Chinese article on the mineral resources of Tibet is once translated, it wil] 
not need further translation. Accordingly, machine translation should be ar- 
ranged under an agency of the Government, such as the Department of Defense 
the Library of Congress, the National Science Foundation, the Department of 
Health, Education, and Welfare, or a separate foundation devoted to linguisties, 
Such an organization might circulate all translated materials gratis, or for a fee. 

Until machine translation is perfected, however, research should have jp- 
ereased Government support. For before the advent of machine translation re. 
search in linguistics was dependent on a segment of the money available to the 
humanities, which never compared with that provided for the social sciences 
let alone the natural sciences. Even this year it is difficult to find support for 
research into language which may have a direct bearing on the work in machine 
translation. 

CONCEPTS OF USE OF AN OPERATING SYSTEM 


The perfecting of machine translation will be of tremendous service to all 
dealing with intelligence and research of any kind. It will however have addi- 
tional uses. But to achieve these, specially designed computers will be produced 
for machine translation. The work of our group is beginning to indicate the 
direction of their design. 

When machine translation is accomplished, the volume of materials trans- 
lated will be so great that even specialists will be hard pressed to keep up with 
the material in their fields. Computers therefore will not be required merely to 
translate, but also to provide abstracts or indices of their outputs. Data pro 
cessing lends itself readily to such uses. We have studied these extensions of 
machine translation and see no difficulty in their application. 

Somewhat more complex will be the evaluation of data, As our understanding 
of computer possibilities increases, however, this use will also follow. 

As material is translated, it will be abstracted, and the abstracts correlated, 
If, for example, a research scholar is interested in the potential of an area, eg, 
Ethiopia, he will not merely be able to request translations of material out- 
lining its resources and their development, he will also be able to request an 
analysis of its resources coupled with the area’s economic production, popula- 
tion and other data which will provide a rounded report. The data will be 
processed for him in such a way that he can readily form judgments on the 
area’s status, its fields of progress and neglect. If a scholar is involved in an 
intricate problem, such as the genetic investigation of a species, or the at- 
mosphere on another planet, before undertaking his research he will be able to 
determine the extent of knowledge concerning his subject, if an adequate system 
for obtaining data has been set up. <A further use will be for translation of 
English materials into other languages. Backward countries, and even coun- 
tries like Turkey or the United Arab Republic, find it difficult to keep up with 
publications containing information on technical and scientific advances, Their 
best likelihood of receiving adequate materials will come from a massive trans- 
lation effort, and this can best come from machine translation. Such a program 
will have great political importance and international benefits for us. 

With continued support, machine translation will be accomplished satisfac- 
torily in the next decade. It is encouraging that the Committee on Science and 
Astronautics is concerned with its research and development. Besides continued 
support, optimum use of machine translation will depend on organization ready 
to disseminate its results for the various requirements of our country. 


RESEARCH IN MACHINE TRANSLATION AT THE UNIVERSITY OF CALIFORNIA, 
BERKELEY 


Sydney M. Lamb 
1. THE NATURE OF THE RESEARCH 


The machine translation (MT) project at the University of California started 
in October 1958, under the financial sponsorship of the National Science Founda- 
tion. Having been in existence for slightly more than a year and a half, it is 
the newest of the groups working on Russian-English MT. 
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Perhaps the one feature which most clearly characterizes the work of the 
project, in contrast to that of the field in general, is the separation made between 
(a) the designing of procedures to be followed by the machine (for parsing 
sentences, selecting target representations, etc.) and (b) the accumulation of 
the information and the analysis on which such procedures must be based. Such 
separation necessarily imposes a requirement on the proper sequence of research 
operations. That is, the information must be obtained before the procedures 
are worked out, at least with regard to their details. To attempt to work out 
the procedures first is not only unnecessarily difficult and time-consuming but 
also generally useless, because of the extensive revision that must be done after 
the data are in. 

The foregoing facts would seem to be too obvious to mention, were it not that 
they have generally been overlooked. That is, in the general approach to the 
MT problem, no clear separation is made between the two types of research 
activity. The main reasons for this situation are (a) lack of awareness of 
how extensive an amount of linguistic information is needed, or a belief that 
it is already available; and (0b) an inability (real or supposed) to determine 
what kind of information is needed in any other way than by observing the 
failures of trial procedures. The result of the general approach has been that 
the bulk of the efforts expended by the various projects has been devoted to 
the design of experimental machine procedures, none of them based on adequate 
information. 

The project at the University of California is thus relatively distinctive in 
that it is concentrating on the linguistic analysis the results of which must 
eventually underlie any usable procedure. 

The kind of information which is needed can be determined from a knowledge 
of the operations that must be performed in order to trans!ate. In terms of 
the theory of linguistic structure used by the California project, these operations 
are: 

(1) Dictionary search.—This operation involves two processes which must 
be performed together, namely segmentation of the text to be translated into 
the lexical items for which dictionary entries will have been set up, and the 
location of those entries or the obtaining of the information contained in them. 
The items for which dictionary entries exist may be called lexes. Types of 
lexes include prefixes, bases, derivational suffixes, inflectional suffixes, and 
uninflected words. Segmentation of prefixes and derivational suffixes is con- 
fined to those which are capable of occurring in new formations. 

(2) Lexremic assignment.—A lexeme is the representation of a lex on a 
higher level of linguistic structure called the morphemic level. It is a sequence 
of one or more morphemes. Lexemes may have different lexes, or allolexes, e.g., 
the English lexes <child> and <childr> (the latter occurring in <children>), 
which represent the single lexeme {child}. Conversion from lexes to their cor- 
responding lexemes is automatic, and can therefore be handled during the process 
of dictionary look-up, except in the case of those lexes which represent more than 
one lexeme (e.g., the suffix —a of Russian, which can represent genitive singular, 
nominative singular, nominative plural, present gerund, ete.). The correct 
lexemic assignment can usually be handled fairly easily through identification 
of essential features of the immediate environment. 

(3) Selection of target representations.—This operation is the heart of the 
translation process. The meaning carried by the source-language lexemes must 
be provided with a representation in terms of lexemes of the target language, 
arranged in a suitable order. The operation is difficult because (a) for any 
pair of languages, many source-language lexemes will have different target 
representations in different environments; and (0) the lexemes of the target 
language should often be arranged in an order different from that of the source 
lexemes represented. The second type of difficulty is of greater or less importance 
depending upon how different the syntactic structures of the two languages 
are. In dealing with both sources of difficulty, the translating machine must 
generally make use of a fairly considerable amount of information about the 
syntactic structures of the individual sentences involved. It must, therefore, as 
a part of this general operation, work out the syntactic relationships present in 
the sentences. This process may be referred to as parsing. Obviously a pro- 
cedure for parsing can be effective only if it incorporates the results of an ade- 
quate syntactic analysis of the source language. By syntactic analysis, as 
opposed to parsing, is meant a type of linguistic investigation leading to 
the determination of the syntactic structure of a language. In terms of the 
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dichotomy of information and procedure to which attention is drawn at the 
beginning of this paper, syntactic analysis provides the information on the basis 
of which a procedure for parsing can be devised. 

(4) Selection of target alloleres—Many English lexemes have different 
allolexes of which one must be selected in any given environment. The linguistic 
problems involved in this operation are elementary. . 

Of the various types of problems that must be solved before machine transla- 
tion from Russian to English can become a reality, by far the most difficult are 
those concerned with operation (3) above (“Selection of Target Representa- 
tions’). For their solution it is not so much procedures as linguistic informa- 
tion on which to base the procedures that is needed. Since devising procedures 
of this kind is easier than obtaining the linguistic information, especially if it is 
done after the information is available, the greatest present need in MT research 
is for a large amount of detailed information relevant to the problem of selecting 
representations of Russian lexemes. This problem is concerned primarily with 
the semantic structure of Russian. All of the other areas of research and de 
velopment connected with machine translation of Russian are, by comparison, 
easy to handle. Therefore, the research policy of the group is based on the 
conviction that the only important obstacle to operational machine translation 
of Russian is lack of sufficient knowledge of the structure of that language. 


2. ACCOMPLISHMENTS SO FAR 


Listed below are 10 of the most important systems and research tools which 
the group has been producing that will contribute to the future of MT. Each of 
these items is believed to be either the best of its kind or the only one of its 
kind in the MT field. 

(1) A maximally effective segmentation system for Russian. The system is 
based on principles of economy, with regard to both dictionary look-up and 
translation proper, as explained in a forthcoming publication [3], and separates 
derivational suffixes and prefixes which are capable of occurring in new forma- 
tions, thus making it possible for an automatic dictionary to keep pace with 
newly created technical terminology. In connection with this problem, a de- 
tailed study of the productivity of Russian derivational suffixes has recently been 
completed. 

(2) Two systems for syntactic coding, based on this segmentation system. 
One of them is a relatively simple mnemonic code and the other is machine 
oriented and more detailed. The latter is directly convertible into the former. 

(3) A Russian-English dictionary for MT: At present it includes some 16,000 
lexes representing an estimated 14,000 lexemes. From the bases and productive 
derivational affixes can be formed well over 30,000 morphemically different stems. 
Counting all inflected forms separately, the estimated vocabulary coverage is 
over 300,000 words. Since much of the information which must eventually be 
included in the dictionary has yet to be obtained, the various dictionary entries 
are incomplete to varying degrees. 

(4) An automatic dictionary system for use on an IBM 704 (or any similar 
computer), which provides for a vocabulary coverage of up to half a million 
words and accomplishes look-up (including segmentation into lexes) at a rate 
of about 125 words per second (7,500 words per minute) [1]. The development 
of this system settles decisively the question of whether or not special equip- 
ment is necessary for dictionary storage in an economical automatic translation 
system. If a cost of $270 per hour ($4.50 per minute) is assigned to the use of 
a 704, dictionary look-up, far from being prohibitively expensive, will cost only 
0.06 of a cent per word. 

(5) An analysis system for obtaining linguistic information from Russian 
scientific text. Analysis sheets prepared according to this system show the re 
sults of the following operations: (a@) Graphemic coding for IBM equipment of 
both Oyrillic and non-Cyrillic material, (b) segmentation into lexes, (c) assign- 
ment of syntactic code symbols according to the human-oriented grammar code, 
and (d@) determination of the preferred translation for each sentence and assign 
ment of each of its features (including order changes) to individual lexemes. 

(6) A text analysis according to this system of about 30,000 words of Russian 
text. (An additional 70,000 words of text have been punched for analysis by 
the project’s automatic text analyzer.) 
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(7) A linguistie data gathering program, for obtaining organized data from 
analyzed text. This program, which operates on the 704, can select every 
example of whatever Russian lexes or syntactic classes are specified (up to 
16 items at one time), and provide a list in which the examples are grouped 
according to the different preferred translations which were assigned, each 
example accompanied by several preceding and following words of context, 
completely analyzed as indicated above (5). The program places the linguistic 
material and the accompanying analysis in a special format allowing it to be 
maximally easy for linguists to study. 

(8) A graphemic coding system for Russian scientific text, suitable for pro- 
duction use. It is so designed that Russian text can be punched directly by 
a properly trained operator without preediting. It is somewhat more refined 
than the similar scheme incorporated in the text analysis system (5) and aec- 
commodates the Cyrillic, Greek, and Latin alphabets, Arabic numerals, mathe- 
matical and chemical symbols, punctuation, italicization, capitalization, and 
subscripts and superscripts. 

(9) A eatalog of order-change situations; i.e., situations in which English 
representations must be arranged in an order different from that of the Russian 
lexemes represented, for the sake of readability and/or intelligibility. The 
situations have been classified and assigned to associated lexemes, making 
possible an efficient means for enabling the computer to identify them and 
to produce suitable English arrangements. The associated lexemes for an order- 
change situation are those which have been selected as constituting a small 
unified class whose presence is a necessary condition for the occurrence of the 
situation. 

(10) A method for automatic parsing of Russian text in order to provide 
information needed in selecting Dnglish representations and working out order 
changes, together with a sizable body of information on Russian syntax with 
which the method is to be implemented. This system operates in terms of 
syntactic constructions, the construction being defined in accordance with con- 
cepts of modern structural linguistics. From a description of the necessary 
syntactic information can be made a syntactic table to which access can be 
had by direct addressing, using syntactic class symbols as addresses. At the 
entry in the table for each lexeme class is given the information as to what 
grouping those lexemes enter into for any construction in which members of 
their class presuppose those of the other. The placement of grouping instruc- 
tions under the peripheral constituent class of each construction guarantees 
maximum efficiency by keeping to an absolute minimum the searching for items 
in the environment which may not be there. 

This parsing system is one of several which have recently been brought forth. 
It is similar to the “Pass Method” of Paul Garvin (Ramo Wooldridge) in that 
it begins with the ultimate constituents and makes successively larger groupings; 
it resembles the “Predictive Analysis” procedure of Ida Rhodes (National 
Bureau of Standards) in that it operates predominantly on a left to right basis, 
without multiple passes; it is like the “Syntagmatic-Syntactic”’ approach of 
Michael Zarechnak (Georgetown University) in recognizing an important differ- 
ence between intraphrase and interphrase relationships; and it agrees with 
the “Dependency” method of David Hays (Rand Corp.) in keeping the program 
separate from the syntactic information upon which it operates. This last 
feature is considered to be of special importance since it enables one to augment 
and modify the syntactic information without reprograming. 


3. RELATIONS WITH OTHER PROJECTS 


In the past the MT field has been hampered by imperfect cooperation among 
the various projects, resulting primarily from a partial failure of some groups to 
keep well informed on the work being done elsewhere. It has been the policy of 
this project since its beginning to follow closely the progress of other projects in 
order that advantage could be taken of results achieved by them. In implement- 
ing this policy the project leader has visited the projects at Wayne State Uni- 
versity, Massachusetts Institute of Technology, Harvard University, Georgetown 
University, the National Bureau of Standards, the Rand Corp., Ramo-Wooldridge, 
and the project which formerly existed at the University of Michigan. The group 
at the University of California, in turn, has had the pleasure of visits from repre- 
sentatives of Georgetown University, the National Bureau of Standards, the 
National Physical Laboratory (of Teddington, England), the Rand Corp., the 
University of Washington, Wayne State University, and Ramo-Wooldridge. Ex- 
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changes of visits are expected to continue since they provide the best means of 
communication among projects. 

The results of cooperation are contributions made by others to the work of this 
project and contributions of this project to others. 

(1) Contributions from others.—In addition to various minor ideas and tech- 
niques of the type of which have been developed independently several times, the 
work of others on (a) programing systems, and (b) automatic parsing systems 
has been of particular interest to this project. Detailed studies have been made 
of the COMIT system (developed at MIT) and the simulated linguistic computer 
(developed by A. F. R. Brown, of Georgetown University), as well as IPL-V 
(an automatic programing system which was developed for purposes other than 
machine translation). In addition, a study of the system being developed by 
Hugh Kelly, of the Rand Corp., will be made in the near future. The extent to 
which it will prove feasible to make direct use of one or more of these systems 
has not yet been fully determined, but COMIT definitely has value as a research 
tool. The project has also taken a great interest in automatic parsing systems 
which have been developed elsewhere, especially in view of the fact that their 
procedural details have been worked out to a greater extent than in the case of 
the Berkeley project. It is not unlikely that ideas contained in some of them 
can be incorporated into the California system in the course of its further refine- 
ment. With this in mind, the “Predictive Analysis’ system of Ida Rhodes (the 
National Bureau of Standards) has been studied, and highly detailed information 
on Paul Garvin’s system has been provided by Ramo-Wooldridge. In addition, 
it is expected that the large amount of pertinent linguistic data which have been 
amassed at the Rand Corp. will prove very helpful. Other contributions to the 
work of this project have included (a) from the Rand Corp., one of the basic 
ideas used in the U.C. dictionary system (as explained in reference [1]), and 
(bo) from MIT, 100,000 words of English text on magnetic tape and an accom- 
panying search program. 

(2) Contributions to others.—It is to be hoped that each of the 10 items listed 
above (sec. 2) will prove useful to other projects. 


4, FUTURE WORK: RUSSIAN TO ENGLISH 


Future plans in the development of the Russian-to-English translation system 
call primarily for continuation of the various types of work now underway, 
There are several of these. In the first place, three of the items listed above in 
section 2 are of an expanding nature; that is, more information is incorporated 
into them as it is accumulated. These are the machine-oriented grammar-coding 
system, the dictionary, and the parsing system. Although already useful in 
their present form, these items will not be complete until a great deal of addi- 
tional information has been obtained. Thus much of the future work will be 
concerned with adding to the information contained in them. In the case of 
the grammar-coding system, the additional information will be concerned pri- 
marily with semantic properties for which Russian lexemes will be coded. The 
dictionary is not expected to require the addition of many more entries in the 
future since its present vocabulary coverage is already extensive. It is incom- 
plete, however, with regard to the information contained in the entries. The 
greatest lack is of information pertaining to the conditions under which alterna- 
tive English representations are to be selected. The dictionary and accompany- 
ing machine system are so designed that information in the entries can be added 
to or modified with great ease. The parsing system requires the incorporation 
of further syntactic information, and certain procedural details have yet to be 
worked out. 

The most valuable means of obtaining linguistic information is considered to 
be analysis of texts. In order to make possible the increased volume of analysis 
needed in the future as well as to free personnel for other tasks and to reduce 
human error, the project is developing an automatic text analyzer to be used on 
the 704 computer. The text analyzer will incorporate the Russian-English dic- 
tionary and a modified version of the dictionary lookup program which has been 
developed for automatic translation systems. Taking advantage of the informa- 
tion that has been accumulated up to a given point, the text analyzer will do 
much of the work that has been done previously by graduate linguists, but will 
leave blanks or offer choices on its analysis sheets wherever lack of formulated 
knowledge makes human judgment still necessary. The human text analyzer’s: 
job will then be only to fill in the blanks and to choose between alternatives, 
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As more and more information is accumulated and incorporated into the system, 
it will eventually be complete enough to justify being converted from a text 
analyzer to an automatic translator. At some time in advance of this date, how- 
ever, the output of the text analyzer will be such that persons with only a rudi- 
mentary knowledge of Russian will be able to produce from its accurate high 
quality translations with great ease. According to present estimates, the text 
analyzer, running on a 704, will be able to operate at a rate of about 2,000 words 
er minute. 

" It is planned that additional linguistic data gathering programs more sophis- 
ticated than the present one will be constructed in order to further facilitate the 
process of obtaining information from analyzed texts. 

Other work in progress includes (a) an exhaustive study of the English repre- 
sentation of Russian case suffixes; (b) the design of systems for English allolexy 
and morphographemics (i.e., for the production of correct English graphemic 
forms for inflectional suffixes, etc.) ; and (c) a study of Russian chemical nomen- 
clature and methods for converting it into English. 


5. FUTURE WORK: CHINESE TO ENGLISH 


The project is now making plans to undertake research leading to machine 
translation of Chinese technical literature. Since much of the work being 
done for the Russian-to-English system is of fairly general applicability, it can 
be very helpful with regard to other source languages, including Chinese. The 
Chinese-to-English research, therefore, is viewed not as constituting a separate 
project but as an extension of the work already in progress. The development 
of a Chinese-to-English system can proceed much more rapidly under these 
conditions than would be possible if carried on as a separate project. In par- 
ticular, the following systems which have been or are being constructed as a 
part of the Russian-to-English research are adaptable to Chinese: (a@) the dic- 
tionary lookup system (the first part of which can be considerably simpler for 
Chinese than it is for Russian) ; (0) the linguistic data gathering program (see 
No. (7) in sec. 2); (c) the automatic parsing procedure, which must, of course, 
be provided with a different body of syntactic information; and (d) the system 
of English allolexy and morphographemics. 

In addition, various techniques which have been developed in the course of 
the project’s work should be of great value in making the work on Chinese pro- 
ceed faster than would otherwise be possible. Important among such tech- 
niques are those of (@) devising syntactic coding systems, (b) compiling dic- 
tionaries, (c) analyzing texts, and (d) constructing automatic text analyzers. 

As indicated at the beginning of this paper, it is the policy of the project to 
concentrate on obtaining the necessary linguistic information before devising 
detailed procedures which require it. This policy is even more important in 
the development of a Chinese-to-English system than in the case of Russian, for 
two reasons. First, whereas Russian and English are related languages having 
numerous structural similarities, Chinese and English are structurally very 
dissimilar, making necessary a much more complete description of Chinese 
structure to serve as a basis for a translation system. Second, the great wealth 
of grammatical studies which are available on Russian, thanks to years of 
devoted effort on the part of many grammarians, is, by comparison, almost 
entirely lacking for Chinese. It is the conviction of the project leader that by 
far the most important piece of work that should be done as a first step leading 
to automatic translation of Chinese is a relatively complete description of Chinese 
syntactic structure. 

Because of the importance of the need for syntactic analysis of Chinese, and 
in view of the large amount of time which is known to be required for detailed 
syntactic analysis when done by conventional means, an investigation has re- 
cently been undertaken to determine the feasibility of constructing a computer 
program for doing syntactic analysis of languages automatically. The results of 
this investigation are so encouraging that they might almost be described as 
dramatic. It is intended to proceed with working out the details of the system 
in order to apply it to the syntactic analysis of Chinese. Not only should this 
make it possible to achieve a description of Chinese syntax faster than would 
otherwise be possible ; it should also sharply reduce the amount of time required 
we the development of future machine translation systems for other language 
pairs. 
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STATEMENT ON MACHINE TRANSLATION RESEARCH BY ARIADNE LUKJANOW, 
MACHINE TRANSLATION, INC., ALEXANDRIA, VA. 


I. NEED FOR MACHINE TRANSLATION 


Machine Translation, Inc. (MTI) has never undertaken a study of this problem 
and therefore is not in a position to assess accurately the actual need for machine 
translation since it lacks the statistical and documentary data necessary for this 
task. We have, however, followed reports on this subject, the contents of which 
give a general idea. 

An UNESCO report in 1957 stated that 50 percent of scientific literature is 
in languages which more than half of the world scientists cannot understand. 
Since Sputnik I, a crash program of the National Science Foundation has been 
turning out an estimated 100 million words per year of technical Russian trans- 
lation, but this is less than half of what is needed. It is impossible to obtain 
enough translators, especially trained personnel, required to work in technical 
fields. Such qualified people as there are cannot be spared from their chosen 
professions. Even if there were enough human and financial resources to have 
the work done by translators, they could not hope to handle all the material 
that is produced by the world’s technical minds. Thus, many scientific dis- 
ecoveries remain inaccessible to the scholars who might otherwise benefit from 
them. 

Since manual translation cannot keep up with the volume of material, research 
in machine translation is being undertaken by a number of universities and 
corporations in the United States and Western Europe. The cost of this research 
exceeds $3 million a year according to Dr. Yehoshua Bar-Hillel in a study under- 
taken for the Office of Naval Research. He estimates that the Soviet Union is 
spending a comparable amount. 

II. METHODS 


A number of approaches have been employed in machine translation over a 
period of several years. These approaches have been either determined by a 
specific objective or influenced by the backgrounds of the research workers. For 
instance, a specific objective of a fast though not polished translation might lead 
to an automatic dictionary technique. The background of a researcher can 
influence his approach in three basic ways: (1) He can be influenced by machines 
in such a way that only the development of a new language computer would lead 
to acceptable results. (2) Another researcher may attempt to completely simu- 
late human reasoning on a standard computer; i.e., the analytical approach. 
(3) A third approach would be an attempt to make machine translation processes 
as mechanical and utilitarian as possible by adopting the approach to the 
eapabilities of the machine by clearly defining the man-machine relationship; 
ie., the pragmatic prediction approach. Since present-day computers are best 
suited to repetitive mathematical operations and man is still the best thinker, 
the approach will wed properly both, the activities of man and machine. All 
of the thinking will be expressed in the form of codes in the dictionary and the 
machine will merely utilize the dictionary in the manner provided by the system. 

In order to translate at all, any system must provide a solution to the problem 
of transferring structure, form, meaning, and location from the source to the 
target language. 

To the best of our knowledge all systems to date have in some way treated 
this fourfold transfer on separate levels or in several stages, without establishing 
the overall relationship of the constituents of the transfer. 

The new approach to machine translation developed by Machine Translation, 
Inc., takes cognizance of all of these points in the form of the unified or com- 
bined single transfer process. Therefore we call our new system the unified 
transfer system (UTS). 
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Ill. PRESENT STATUS OF RESEARCH 
General 

Since 1954 extensive research in machine translation has been carried out by 
a number of universities and private organizations in this country, England, 
Italy, and the Soviet Union. 

On August 20, 1958, a demonstration of the code matching technique (CMT) 
took place on the premises of the Corporation for Economie & Industrial Re 
search, Inc. (CEIR). The system tested was conceived and developed by 
Ariadne Lukjanow, then at Georgetown University. 

The demonstration consisted of several articles of Russian chemical literature 
which were translated on an IBM 704 computer. The National Science Foun- 
dation reported : “The evaluation by chemists on the intelligibility and complete- 
ness of the text was positive.” 

In a subsequent demonstration on November 20, 1958, before the International 
Scientific Congress, articles from other fields of knowledge were demonstrated 
using improved programs. 

These tests were not designed to produce perfect translation, but primarily to 
prove that machine translation is a practical possibility. The CMT system is 
experimental and possesses too many practical limitations to render it useful 
except in a scientific sense. The decimal coding employed, the segmental indi- 
vidual operations, long-fixed records, complete linguistically oriented logic, ex- 
tensive use of macroprograming and subroutines make the CMT system a purely 
experimental model. However, the CMT not only proved that a system of this 
type yields translation and that the principles of the approach are valid, but 
provided us with the knowledge and experience necessary to produce an opera- 
tional system. 

The CMT experiment and experience gained in preparing it were valuable 
for the development of a production model system which we call the unified 
transfer system (UTS). 

A. Some aspects and present status of the unified transfer system (UTS) 

Translation is a process of transferring one set of data into another. As ap- 
plied to languages, it is a fourfold transfer process: 

1. The transfer of the function of words; 

2. The transfer of the form of words; 

8. The transfer of the meaning of words: and 
4. The transfer of the distribution of words. 

In the UTS these four transfers are considered as a single transfer process. 
In order to achieve this transfer, we have devised a classification system for each 
of the transfers expressed in the form of a code. We have then merged these 
codes into unified code patterns, 

The UTS is completely worked out and described in manuscript form. It is 
ready for programing and testing. The main features of the UTS: 

1. The fourfold transfer process of translation (function, form, meaning, and 
distribution) has been transformea into a single unified transfer process through 
the establishment of the relationship between the constituents of each transfer 
and expressed in a 12-digit code. 

2. The individual codes are incorporated into 478 master code patterns. Each 
code pattern determines all possible meaning, form, functional, and environ- 
mental conditions for a class of words. Therefore, each word in the dictionary 
will carry a pattern reference number and the dictionary operations are thereby 
greatly reduced. These master code patterns are to be stored as part of the 
program. 

3. The operations in the algorithm are performed on the basis of relationships 
between the patterns rather than identification of the individual constituents of 
either pattern. It is based on simple ideas of arithmetic progression and com- 
parison of codes. The logic of the system is more abstract, less linguistically 
prem in its final instructions, and therefore more suited for machine opera- 
tions. 

4. Maximum utilization of the computer is obtained by octal notations in codes, 
and the use of indirect addressing, buffering systems, simultaneous input/output, 
increased storage capacity, and conversion instructions. 

5. In addition to the manuscript describing the UTS, a set of instructions has 
been prepared. It is ready for actual programing and testing on a general-pur- 
pose computer. 
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6. A dictionary of 24,000 canonical entries (word stems) is ready for auto- 
matic conversion into a paradigmatic dictionary. It is cross-referenced to code 


patterns, as required by the UTS. 
B. Capacity and application of the unified transfer system (UTS) 


The UTS has not been designed for any particular computer. It can be pro- 
gramed for any large general-purpose computer. We have established that on 
the IBM 709 this system would produce between 25,000 and 40,000 words per 
hour. On the IBM 7090 this number would be increased to more than 100,000 
words per hour. On Stretch it could be implemented with a translation rate of 
close to 3 million words per hour. 

In addition, the UTS is applicable not only to Russian-English but to other 
combinations of languages. We have already investigated its applicability to 
Russian-German, German-English, and Chinese-English. This system has all the 
necessary provisions for automatic indexing of translated material and contains 
sufficient data for the development of an abstracting system. Both these proe- 
esses, indexing and abstracting, can be incorporated into the UTS in such a 
fashion that they would not require any substantial increase of translation time, 
We have likewise investigated the possibility of simultaneous translation from 
one language into several on the same pass through the machine. We found it 
feasible and practical. 

All the work done by the UTS thus far has been accomplished without any fi- 
nancial support from either the Government or private sources. The programing 
and testing, as well as the compilation of large dictionaries, are entirely beyond 
the means of Machine Translation Inc., both in terms of money and personnel. 

Provided sufficient funds were forthcoming, the Russian-English production 
phase of translation could be accomplished within 6 to 9 months. Simultaneous 
translation could be achieved within 3 or 4 months. Indexing would require 
3 or 4 months; abstracting, for which some research is necessary, would take 
from 6 to 8 months. Instead of working on these phases separately, we would 
be prepared and willing to work on them simultaneously. In that case all these 
tasks could be accomplished within a year. 


IV. CONCEPTS OF USE OF AN OPERATING SYSTEM 


In our opinion the primary objective of machine translation is to provide the 
U.S. Government with an efficient tool, i.e., an operational system for translating, 
indexing, and abstracting in the fields of technical and natural sciences. Once 
an operational system has been conceived, developed, and tested, an organization 
for its use should be established. Such an organization, by virtue of the prob- 
lems involved, has to be a combination of computer installation and publishing 
house. It should be capable of producing and publishing translations, indexes, 
and abstracts. In addition, it should be able to perform the following services: 

1. Efficient distribution of materials produced. 

2. Publication of bulletins at regular intervals (weekly, biweekly, and/or 
monthly) bringing to the attention of the Government and private industrial 
research organizations the highlights of the available translated material in 
various fields of technical knowledge. 

8. Maintain an efficiently organized library. 

Only an organization capable to provide these facilities will properly utilize 
an operational system for machine translation and provide service to the U.S. 
Government, research and educational institutions, as well as to private 
industry. 

ALEXANDRIA, VA., April 20, 1960. 

MACHINE TRANSLATION, INC. 


OpTICAL SOCIETY OF AMERICA, 
Washington, D.C., May 24, 1960. 
Hon. OverRTON BROOKS, 
Chairman, Science and Astronautics Committee, 
House of Representatives, Washington, D.C. 

Dear Mr. Brooxs: We wish to bring this letter to your attention and to have 
it:inserted in the Congressional Record, in order to correct a misstatement, which 
was made before your committee at the hearings held during the week of May 9 
on mechanical translations. 
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The statement made by Dr. Gilbert W. King of the International Business 
Machines Corp., which we wish to correct is the following : 

“Speed is one of the reasons machine translation is necessary because current 
human translation of important Russian work is too slow. For example, the 
translated Journal of the Optical Society of America (sic) reaches him 18 
months after the appearance of the Russian original.” 

Since the Optical Society of America received its grant from the National 
Science Foundation in March 1959 for support of the translation program of the 
Russian journal, Optika i Spektroskopiya (translation issue entitled “Optics and 
Spectroscopy”), all 12 1959 issues of this journal have been translated and 
distributed as of May 25, 1960, to the members of the Optical Society of 
America, under the general direction of our assistant secretary, Miss P. R. 
Wakeling. The total number of subscriptions to this translation journal is about 
5,500, the largest of any scientific Russian translation journal, we believe. 
Translation copies of the January 1959 Russian issue were mailed to OSA mem- 
bers in August 1959, of the May 1959 Russian issue in December 1959, and of the 
December 1959 Russian issue on May 25, 1960. The Russian January, February, 
and March 1960 issues are in various translation and publication stages and 
are expected to be mailed to OSA members in July 1960. All of these time 
intervals of 7 to 5 to 4 months and hopefully to 3 months for delivery of a 
translated issue, are certainly well within the 18 months interval mentioned by 
Dr. King. In fact, to date this entire translation project by OSA has been in 
existence less than 18 months, 

Really, the more important problem to consider in connection with “current 
human translation of important Russian work” is the elapsed time interval 
required by such a translation. This is the time interval which extends from the 
date of receipt of the Russian journal in the executive office of the Optical 
Society of America, through the time needed for the translation, scientific edit- 
ing, composition of English translation, printing, and publishing, to date of 
mailing to OSA members of the translation. In the case of the Russian journals 
of October, November, and December 1959, the elapsed time for these trans- 
lations was 414, 4, and 4 months respectively. We hope to decrease this interval 
to 3 months with later issues as indicated in the following table: 


Translation schedule of optics and spectroscopy 


Date received Date of mailing to OSA Elapsed 
Volume, number, date on Russian journal in OSA execu- members time ! 


tive office 

VII, No. 4, Jem: 13,1000) May 17; 

VII, No. 6, Jan. 28,1960 | May 25, 1960- 

VIII, No. 1, January . 16,1960 | June 1960 (expected) 


1 Elapsed time from date of receipt of Russian journal in OSA office of date of mailing te OSA members 
of the translation. 
2 Estimated. 


In support of our publication schedule, attached are copies of translations 
of January 1959 through November 1959 issues of the Russian translation 
journal, Optics and Spectroscopy. The translation of the December issue is 
due to be mailed on May 25, 1960. 

In conclusion, while we agree with Dr. King that speed in translation is im- 
portant and are working toward this objective ourselves by aiming to lower our 
elapsed time per translation of an individual Russian issue from 4 to 3 months, 
the translations of certainly 11 and probably 12 1959 issues of the Russian 
journal, Optika i Spektroskopiya, have been put out by the Optical Society of 
America in periods very much shorter than the 18-month period, claimed by 
Dr. King in his testimony. 

Sincerely, 
Mary Warea, Erecutive Secretary. 
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